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ABSTRACT 


Global  queueing  network  performance  models  are  developed  for  the  increasingly  important 
class  of  computer  networks  comprising  a  number  of  independent  computing  systems  sharing  a 
single  resource.  An  extensive  bibliography  and  survey  of  prior  work  relating  to  this  topic  are 
included.  Analytic  expressions  of  performance  measures  for  this  class  of  systems  are  derived  from 
the  general  theory  of  multi-class  queueing  networks,  and  new  computational  algorithms  for 
evaluating  them  are  presented  that  are  memory-space  efficient  (linear  vs.  exponential)  compared 
with  known  algorithms  for  the  general  theory.  This  exact  analytic  model,  called  the  Shared 
Central  Server  Model,  incurs  approximately  the  same  exponential  time  complexity  in  its 
evaluation  as  do  all  models  based  on  the  general  theory;  because  of  this,  a  simple  heuristic 
approximate  model  of  this  class  of  systems  is  also  presented  that  is  computationally  efficient  in 
both  time  and  space.  Modular  expansion  of  this  class  of  systems  is  investigated  using  the 
approximate  model,  and  a  useful  relationship  is  derived  between  the  number  of  additional 
independent  computing  systems  and  the  incremental  increase  in  capability  of  the  shared  resource 
required  to  maintain  the  existing  level  of  system  performance. 


Key  words:   Approximate  queueing  models;  computer  architecture;  modular  expansion 
analysis;  performance  evaluation;  performance  modeling;  queueing  models;  queueing 
networks. 
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I.  INTRODUCTION 
A.  Resource  Sharing 

Resource  sharing  is  an  old  concept.  It  exists  in  every  industry  and  facet  of  life.  The  basic 
motivation  for  resource  sharing  is  the  existence  of  a  scarce  resource,  caused  primarily  by  economic 
or  physical  considerations. 

In  the  computer  industry  there  has  been  significant  interest  in  distributed  processing  [IDC  76] 
and  a  proliferation  of  computer  networks  [COTTON  79,  LEUNG  78,  MONAHA  79,  SPRING 
78,  WILKES  79].  The  primary  goal  of  these  schemes  is  to  provide  various  mechanisms  for 
resources  sharing  [KAHN  72,  ROBERT  70].  In  these  environments  the  scarce  resources  include 
information  and  capabilities  (programs  or  processes)  as  well  as  devices. 

The  cost  of  processing  power  and  primary  memory  is  currently  decreasing  at  a  rate  of  50% 
per  year,  while  the  costs  of  communication  facilities,  secondary  memory,  peripherals  and  special 
purpose  devices  are  decreasing  at  a  rate  of  10%  per  year  or  less  [BBN  79].  LSI  techniques  and 
mass  production  have  been  the  primary  causes  of  these  cost  reductions.  This  has  resulted  in  two 
basic  system  development  strategies  that  can  be  generally  associated  with  opposite  ends  of  the 
computer  cost  spectrum.  These  two  strategies  can  be  categorized  as  more- for- the-same-cost  and 
the-same-  for-less-cost. 

The  trend  on  the  upper  end  of  the  cost  spectrum,  is  for  the  cost  not  to  decrease,  but  to 
compensate  by  providing  increased  speed,  capacity,  and  capability.  New  generations  of  more 
powerful  systems  are  being  offered  which  include  parallel  architectures  based  on  array  and 
pipeline  concepts.  Previously  their  cost  would  have  been  prohibitive.  As  a  result,  their 
predecessors  of  the  previous  generation  are  subsequently  being  offered  on  the  market  at  discount 
prices. 

On  the  lower  end  of  the  cost  spectrum  actual  cost  reductions  are  being  offered.  These 
basically  comprise  microcomputer  and  minicomputer  systems,  as  well  as  some  individual 


components.   With  each  cost  reduction  the  acquisition  of  these  systems  becomes  more  feasible  for 
an  increasing  segment  of  the  business  community  and  the  general  public. 

The  cost  reduction  in  microcomputer  and  minicomputer  systems  as  compared  to 
maxicomputer  systems  has  encouraged  acquisition  of  many  independent  small  systems  vs.  a 
single  large  one.  Functional  and  administrative  separation,  along  with  the  lower  price,  has  also 
fueled  this  trend  [IDC  76].  But  as  applications  become  more  sophisticated  and  complex  they  tend 
to  require  more  data  from  other  parts  of  the  organization  and/or  more  computational  power.  In 
this  environment  the  small  systems  can  handle  the  local  and  overhead  processing,  allowing  the 
large  system  to  be  utilized  efficiently  to  provide  computational  power  and  large  storage 
repositories. 

Secondary  storage  systems,  peripherals,  and  special  purpose  devices  are  now  the  scarce 
resources  within  and  between  computing  systems,  rather  than  the  processors  and  primary 
memories  as  was  previously  the  case.   The  increasing  economies  of  scale  in  secondary  storage 
technology  has  made  the  concept  of  a  large  pooled  storage  subsystem  attractive  [WATSON  80]. 
One  of  the  objectives  of  a  Back-End  Network  [CHAMPI 80,  LAM  79,  WATSON  80]  is  to  provide 
high  speed  access  to  global  peripherals  and  storage  subsystems  [CHAMPI  80,  WATSON  80].  The 
Octopus  network  at  Lawerence  Livermore  Laboratory  and  NASA's  Skylab  network  are  two 
examples  of  this  type  of  architecture  [THORNT  80].  Several  mass  storage  subsystems  are 
currendy  on  the  market  (e.g.  IBM  3850  and  CDC  38500).  A  recent  mass  storage  subsystem 
design  from  Nippon  electronics  Co.  [SEKINO  79]  listed  shared  use  by  multiple  independent 
computer  systems  as  one  of  its  primary  design  criteria.  Although  large  expensive  systems  have 
always  been  a  scarce  resource,  they  are  now  being  shared  by  low  cost  small  systems.  Some 
organizations  that  had  more  than  one  independent  computing  system  (ICS)  have  integrated  them 
[IDC  76].  Their  motivation  has  been  to  reduce  costs  by  sharing  resources  (data,  programs,  and 
devices).  Other  organizations,  in  similar  circumstances,  are  investigating  the  feasibility  and 
benefits  of  embarking  on  similar  integration  efforts  [IDC  76]. 

Two  popular  approaches  to  performing  this  type  of  integration  are  through  a  common  shared 
secondary  memory  system  [CDC  75]  or  a  local  area  network  (LAN)  [CARPEN  79].  Neither  of 
these  approaches  excludes  the  other.  The  shared  secondary  memory  approach  requires  a 
multiplexing  device  between  the  ICSs  and  the  shared  secondary  memory.  Some  intelligence  is 
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also  required  either  in  each  ICS  or  in  the  multiplexor  to  handle  the  synchronization  and  lock-out 
mechanisms  necessary  to  accommodate  simultaneous  access  to  common  objects.  If  the  ICSs  are 
not  local,  then  communication  facilities  are  also  required.  If  the  ICSs  are  within  a  few  kilometers, 
then  this  communication  facility  could  be  a  LAN. 

The  LAN  approach  requires  a  communication  medium  and  a  number  of  interface  units,  at 
least  one  for  each  device  placed  on  the  network.  It  is  desirable  for  these  interface  units  to  have 
some  intelligence,  so  that  the  operation  of  the  LAN  is  kept  relatively  transparent  to  each  device. 
A  device  on  the  network  may  be  an  ICS,  a  shared  secondary  memory,  or  any  other  device  that 
may  be  shared  or  desires  to  share  the  devices  on  the  network. 


B.  Objectives 

We  have  briefly  discussed  the  rationale  for  resource  sharing  within  and  between  computer 
systems.  Two  approaches  that  have  been  used  to  accomplish  resource  sharing  were  also  discussed. 
The  resulting  architectures  of  these  approaches  facilitate  modular  expansion  by  allowing  the 
addition  of  ICSs,  as  well  as  making  the  sharing  of  data  easier.  It  is  our  objective  in  this 
dissertation  to  investigate  these  types  of  systems  which  share  resources.  Although  our  previous 
discussion  has  been  primarily  concerned  with  secondary  memory  as  the  shared  processing 
resource  (SPR),  it  is  applicable  to  any  SPR.  Our  focus  will  be  on  an  architecture  consisting  of  a 
single  SPR  among  a  number  of  ICSs.  Our  investigation  will  be  concerned  with  the  relationship 
between  the  performance  of  each  ICS  based  on  the  processing  rate  of  the  SPR  and  the  number  of 
ICSs.  To  accomplish  this  we  will  construct  a  queueing  model,  and  develop  the  expressions  for  the 
desired  performance  measures. 

Little,  if  any,  analysis  on  this  class  of  architecture  has  appeared  in  the  literature.  Most  of  the 
analyses  are  concerned  with  the  individual  subsystems,  rather  than  the  overall  system. 
Investigation  into  the  performance  evaluation  of  ICSs  [BRANDW  77,  BUZEN  71],  secondary 
memory  subsystems  [CHANG  72,  COFFMA  68A,  HOOGEN  77],  and  general  communication 


subsystems  [FRANK  72,  KLIENR  64,  KOBAYA  77,  WONG  78]  have  and  continue  to  be  done. 
We  feel  it  is  important  to  provide  designers  and  analysts  the  results  of  an  analysis  on  this  class  of 
architectures,  while  in  addition  providing  them  with  useful  analytic  tools. 

By  taking  this  global  analysis  viewpoint  we  will  not  directly  take  into  account  the  effects 
caused  by  various  strategies  within  the  subsystems,  such  as  the  effects  of  different  communication 
protocols.  We  will  assume  that  the  communication  subsystem  (multiplexors,  LAN,  etc.)  has 
sufficient  bandwidth  so  that  it  may  be  disregarded  as  a  bottleneck  for  performance  evaluation 
purposes  [THORTON  80].  Generally  these  delay  times  are  insignificant  when  compared  to  the 
processing  delays  of  the  various  devices  comprising  the  system.  When  these  times  are  significant, 
then  the  processing  delays  of  the  devices  can  be  extended  to  incorporate  them. 


C.  Organization  of  Dissertation 

Chapter  II  discusses  some  of  the  related  and  previous  work  in  the  computer  queueing 
preformance  evaluation  area.  Appendix  A  provides  a  short  review  of  the  mathematics  of 
queueing  network  theory.  Appendices  B  and  C  provide  a  glossary  of  terms  and  a  definition  of 
mathematical  notation  used  in  this  dissertation,  respectively.  Due  to  the  rather  extensive  use  of 
mathematics   in  this  dissertation  the  reader  is  urged  to  refer  to  Appendix  C  whenever  unfamilar 
notation  is  encountered. 

In  chapter  III  a  queueing  network  model  is  developed  for  the  system  architecture  we  have 
introduced  here,  along  with  relevant  performance  measures.  Efficient  computational  algorithms 
are  presented  for  the  evaluation  of  these  performance  measures. 

Previously  the  evaluation  of  queueing  network  models  required  memory-space  and  time 
complexity  both  growing  exponentially  with  the  size  of  the  state-space.  The  algorithms  we 
develop  to  evaluate  our  model  require  memory-space  that  grows  linearly  with  the  size  of  the  state- 


space,  although  the  time  complexity  still  grows  exponentially.  This  provides  the  designer  and 
analyst  the  ability  to  evaluate  this  model  when  it  has  a  large  state-space  if  they  are  willing  to  invest 
the  computation  time.  Whereas,  previously  it  may  not  have  been  possible  due  to  physical 
memory-space  limitations. 

Chapter  IV  presents  an  approximate  model  for  this  class  of  architectures  that  is  significantly 
more  efficient  in  computation  time  and  memory-space  complexity  than  is  the  exact  model  of 
chapter  III.  The  associated  performance  measures  and  an  efficient  computational  algorithm  to 
evaluate  them  are  also  presented.  The  results  of  this  approximation  are  compared  to  those  of  the 
exact  model. 

The  performance  measures  predicted  by  this  approximate  model  do  result  in  a  varying 
relative  error,  which  we  consider  to  be  within  acceptable  engineering  limits.  The  efficiency  gained 
in  their  evaluation  is,  for  most  applications,  thought  to  be  an  acceptable  compromise  for  the  error 
incurred.  For  situations  with  extremely  large  state-spaces,  it  may  be  the  only  analysis  method 
possible.  As  a  result  we  provide  the  designer  and  analyst  the  capability  to  use  the  approximate 
model  to  obtain  estimates  of  the  performance  of  a  large  number  of  system  configurations  in  a  very 
short  period  of  time.  Once  a  small  number  of  candidate  configurations  are  culled,  the  exact  model 
may  be  applied  to  obtain  more  accurate  performance  predictions. 

In  chapter  V  both  the  exact  and  approximate  models  are  analyzed  to  obtain  relationships 
between  response/delay  times  as  a  function  of  the  number  of  ICSs  and  the  processing  rate  of  the 
SPR.  The  analysis  of  the  exact  model  is  shown  to  produce  only  an  upper  bound  which  is  too  high 
to  be  of  any  practical  use.  The  analysis  of  the  approximate  model  yields  a  very  useful  and 
intuitively  satisfying  relation  between  the  addition  of  ICSs  and  the  incremental  increase  in  SPR 
processing  rate  required  to  maintain  system  response  time.  This  result  will  be  useful  to  designers 
and  analysts  when  they  consider  building  new  systems  or  augmenting  existing  systems  which  are 
based  on  this  class  of  architecture.  These  results  are  then  applied  to  two  design  situations  to 
provide  examples  of  the  utility  of  this  model. 

Chapter  VI  summarizes  the  salient  results  of  this  dissertation.  Some  directions  for  future 
potential  research  extensions  are  also  discussed. 


II.  MODELING  CONCEPTS 


A.  Related  Areas 


Prior  to  setting  forth  the  proposed  shared  processing  architecture  (SPR)  two  related  areas 
were  investigated.  One  is  the  development  and  status  of  the  theory  that  is  to  be  applied  toward 
the  modeling  endeavor.  The  second  is  similar  situations  that  have  been  studied. 

Queueing  network  theory  will  be  applied  toward  developing  our  SPR  model.  Existing 
queueing  network  modeling  techniques  and  programming  facilities,  other  than  product  form, 
reasonably  allow  analysis  of  systems  consisting  of  a  few  thousand  states.  These  techniques  are 
generally  limited  by  algorithm  complexity  and  machine  resources.  There  is  currently  no 
indication  that  significantly  larger  state-spaces  will  be  accommodated  in  the  near  future 
[CHANDY  78].  However,  systems  whose  structure  conforms  to  the  requirements  of  a  product 
form  solution  may  accommodate  state-space  sizes  many  times  larger  than  other  queueing  network 
modeling  techniques.  In  the  next  section  we  review  the  area  of  queueing  network  theory. 

Modeling  requires  one  to  be  concerned  with  two  levels  of  abstraction.  The  first  level  is  where 
the  analyst  specifies  a  descriptive  model  by  selecting  "key"  aspects  of  the  actual  system.  The 
second  level  is  where  the  analyst  formulates  or  applies  an  analytic  model  to  represent  the 
descriptive  model.  In  formulating  both  these  levels  of  models  various  simplifying  assumptions  are 
generally  made.  In  applying  these  assumptions  the  resultant  models,  on  either  level,  stray  from 
accurately  portraying  the  actual  system.  In  the  general  literature,  as  in  this  dissertation,  a  first 
level  descriptive  model  is  presented  and  a  second  level  analytic  model  is  formulated  to  closely  or 
exactly  represent  that  descriptive  model.  But  the  accuracy  of  the  analytic  model  depends  on  how 
well  the  descriptive  model  represents  the  actual  system. 

In  certain  situations  this  presents  a  dilemma  to  the  analyst,  whether  to  formulate  an  inaccurate 
descriptive  model  for  which  an  analytic  model  can  provide  an  exact  solution;  or  to  formulate  an 
accurate  descriptive  model  for  which  an  approximate  analytic  model  can  provide  an  inaccurate 


solution.  Both  of  these  approaches  yield  inaccurate  results  for  the  actual  system.  The  more 
correct  approach  remains  an  open  question,  to  be  handled  on  a  case-by-case  basis. 

The  approach  chosen  by  a  given  analyst  depends  on  many  factors,  such  as  time,  analytic  tools 
and  techniques  both  familar  and  available,  as  well  as  the  level  of  confidence  in  them.  Chandy 
[CHANDY  78]  suggests  the  analyst  selection  criteria  are,  in  descending  order  of  importance,  (1) 
solution  speed,  (2)  credibility,  and  (3)  degree  of  accuracy.   Based  on  this  ordering  it  seems  that 
analysts  will  sacrifice  accuracy  for  quick  results.  This  should  be  further  qualified.  An  analyst  will 
sacrifice  accuracy  in  return  for  a  quick  solution  if  it  will  provide  insight  into  the  behavioral  trends 
of  the  actual  system.  Therefore,  one  may  conclude  that  approximate  models  that  provide  a  fast 
solution  and/or  more  closely  represent  a  faithful  descriptive  model  of  an  actual  system  are  always 
in  demand. 

The  major  limitations  of  current  queueing  network  theory  can  be  placed  in  two  categories, 
size  and  structure.  Size  limitations  are  concerned  with  the  time  and  memory-space  complexity 
required  to  obtain  solutions  to  the  models  of  systems  which  have  large  state  spaces.  Structure 
limitations  are  concerned  with  systems  whose  operational  structure  does  not  conform  to  the  basic 
assumptions  and  requirements  necessary  to  be  modelled  by  queueing  network  theory.  As  a  result 
of  these  queueing  network  theory  limitations  various  approximations  to  model  systems  which 
suffered  from  one  or  more  of  these  limitations  have  been  proposed.  Depending  on  the  actual 
system  and  the  limitations  one  is  attempting  to  overcome,  these  approximate  models  produce 
varying  degrees  of  success  and  utility.  In  section  C  we  review  some  of  these  approximation 
techniques. 

Although  no  previous  known  work  has  been  attempted  for  our  SPR  architecture,  a  somewhat 
similar  situation  is  the  study  of  memory  interference  (MI)  [BASKET  76,  BHANDA  73,  BURNET 
70,  HOOGEN  77,  MCCRED  73,  OSTERW  72,  RAU  79,  SMITH  77].  Briefly,  this  environment 
consists  of  n  processors  sharing  m  primary  memory  modules.  The  main  analysis  endeavor  is  to 
determine  the  resultant  effective  memory  bandwidth  available  to  the  processors.  This  class  of 
descriptive  models  differs  in  three  primary  aspects  from  the  descriptive  operational  structure  of 
our  SPR  architecture. 


First,  in  MI  the  processor  and  memory  are  tightly  coupled  such  that  processors  operate 
directly  out  of  primary  memory,  and  their  interactions  occur  within  a  few  clock  cycles.  In 
contrast,  the  SPR  is  a  separate  device  which  is  loosely  coupled,  and  interactions  require  hundreds 
or  thousands  of  clock  cycles.  For  MI,  any  delay  caused  by  interference  from  another  processor 
attempting  to  access  the  same  primary  memory  module  results  in  the  blocked  processor  becoming 
inactive.  Work  cannot  progress  without  the  information  from  primary  memory.  The  exceptions 
to  this  are  machines  that  have  have  buffered  look-ahead  and/or  prefetch  environments.  This 
resulting  inactivity  applies  as  well  to  multiprogrammed  processors,  since  the  context  cannot  be 
switched  due  to  the  nonexistence  of  a  quiescent  state  and  the  prohibitive  overhead  which  would 
be  incurred.  For  the  SPR,  the  processor  formulates  a  request  for  service  to  the  SPR  according  to 
some  type  of  message  protocol.  Even  if  the  SPR  is  not  busy  and  the  request  does  not  encounter 
any  additional  delays  other  than  that  required  to  perform  the  actual  service,  the  processor  expects 
a  relatively  large  amount  of  time  to  elapse  before  receiving  the  response.  During  this  time  the 
processor  can  then  attempt  to  continue  processing  (i.e.  overlap)  or  if  multiprogrammed  can  switch 
context  and  proceed  to  process  another  job. 

Second,  MI  is  concerned  with  the  interference  occuring  at  the  access  to  the  individual 
memory  modules  and  is  not  concerned  with  the  job  flow  or  direct  delays  to  any  other  devices. 
The  SPR  architecture  is  concerned  with  the  job  flow  and  the  direct  delays  a  job  incurrs  as  it  flows 
through  the  system.  The  integrity  of  maintaining  the  proper  flow  paths  is  a  prime  consideration  of 
the  SPR  model.  MI  has  no  flow  path,  other  than  the  implied  processor  to  primary  memory  and 
return,  and  has  no  reason  to  maintain  one.  Therefore,  after  the  primary  memory  services  the 
processor's  request  there  is  no  distinction  or  accounting  as  to  which  processor  the  job  returns  to. 

Third,  MI  generally  assumes  at  least  two  or  more  primary  memory  modules  and  two  or  more 
processors,  each  of  which  are  identical.  The  SPR  architecture  assumes  only  a  single  SPR  and  one 
or  more  processors,  each  of  which  may  be  distinctly  different  and  have  a  number  of  peripheral 
processors. 

The  basic  approach  in  constructing  MI  analytic  models  has  been  as  follows.  Assume  a 
probability  distribution  for  the  primary  memory  access  pattern  and  a  relation  between  the 
processor  and  memory  cycle  time.  From  these  derive  the  probability  of  a  processor  making  a 


request  to  each  of  the  memory  modules.  Then  apply  combinatorics  to  weight  these  probabilities 
to  determine  the  mean  number  of  busy  memory  modules  and  the  expected  unit  execution  rate  of 
the  processors. 

A  different  approach  was  used  by  McCredie  [MCCRED  73]  which  was  based  on  queueing 
network  theory.  There  are  many  differences  between  McCredie's  model  and  the  SPR  architecture 
which  do  not  make  this  model  applicable.  A  single  class  of  jobs  is  assumed  and  therefore,  no  flow 
path  integrity  between  processors  can  be  maintained.  There  are  a  multiple  number  of  identical 
shared  devices  and  only  one  job  per  processor  is  allowed.  McCredie  relied  on  Buzen's  algorithm 
[BUZEN  71]  to  evaluate  this  model,  which  is  efficient  since  only  a  single  job  class  was  involved. 


B.   Queueing  Networks 

Queueing  network  models  were  originally  developed  as  an  aid  to  the  management  sciences  for 
"jobshop"  flow  problems.  This  specific  class  of  problems  consisted  of  those  in  which  various  work 
requests  flow  through  a  network  of  service  centers,  forming  waiting  lines  (queues)  at  each 
depending  upon  the  density  of  traffic.  As  computing  systems  became  more  sophisticated,  by 
distributing  functions  in  channels  and  controllers,  it  was  recognized  that  the  executing 
characteristics  (flow)  of  a  job  could  be  thought  of  as  migrating  between  these  service  centers 
(CPUs,  channels,  controllers,  etc.)  and,  therefore,  could  be  modelled  using  queueing  networks. 

There  are  two  basic  research  approaches  to  queueing  network  theory,  probabilistic  and 
algebraic.  The  probabilistic  or  decomposition  [DISNEY  73]  approach  decomposes  the  network 
into  subnetworks,  solves  the  stochastic  flow  (arrival  and  departure  processes)  of  each  subnetwork 
independently,  and  then  recombines  the  results  to  produce  the  overall  stochastic  flow.  This 
approach  has  the  advantage  of  allowing  quite  general  rules  concerning  the  arrival  processes  and 
service  distributions,  which  may  be  non-markovian  in  nature.  The  main  disadvantage  is  a 
consequence  of  their  general  stochastic  nature,  which  usually  results  in  intractable  equations 
yielding  no  closed  form  solution.  Currently,  solutions  are  available  for  networks  consisting  of 
only  two  or  three  service  centers. 


The  algebraic  [WALLAC  73]  approach  represents  the  state  probabilities  as  a  set  of 
homogeneous  algebraic  equations  to  be  solved.  The  state  is  a  vector  description  of  the 
distribution  of  jobs  among  the  service  centers  of  the  network.  The  main  advantage  of  the 
algebraic  approach  is  the  ability  to  handle  networks  with  a  large  number  of  service  centers.  The 
main  disadvantage  of  this  approach  is  the  restrictive  assumptions  on  arrival  processes  and  service 
distributions,  generally  Markovian. 

The  algebraic  approach  has  been  applied  using  various  techniques.  A  separation  of  variables 
technique  has  been  applied  by  Jackson  [JACKSO  63].  Some  numerical  evaluation  techniques 
have  been  attempted.  This  approach  expands  the  system  state-space  description  so  that  the  model 
may  remain  Markovian,  but  also  increases  the  size  of  the  state-space.  The  equilibrium  state 
probability  equations  are  formulated  and  then  solved  numerically  using  techniques  such  as  the 
relaxed  Jacobi  iteration  [WALLAC  66]  method  and  more  recently  the  Gauss  Seidel  [GAVER  76] 
method.  Analytic  solutions  represent  a  special  subclass  of  Markovian  networks,  but  yield  very 
efficient  computations  due  to  their  structural  properties.  Numerical  solutions  on  the  other  hand 
can  handle  any  Markovian  network,  but  are  limited  by  the  size  of  their  state  space.  In  this 
dissertation  we  are  mainly  concerned  with  analytic  solutions  to  the  algebraic  approach  of 
modeling  systems. 

A  general  queueing  network  consists  of  a  set  of  service  centers  arbitrarily  connected,  each 
with  a  queue  and  an  arbitrary  but  fixed  number  of  servers.  The  network  is  referred  to  as  closed  if 
no  new  jobs  arrive  or  leave,  but  a  constant  number  continuously  circulate.  If  arrivals  and 
departures  are  allowed,  permitting  the  overall  number  of  customers  to  vary,  the  network  is 
referred  to  as  open.  Operationally,  a  job  arrives  at  a  service  center  and  is  placed  in  a  queue,  until  a 
server,  according  to  some  scheduling  policy,  is  available  to  provide  the  required  service.  Upon 
completion  of  service,  the  job  transits,  with  no  delays  and  a  constant  known  routing  probability,  to 
another  service  center.  This  sequence  is  repeated  at  each  service  center. 

A  queueing  network  is  specified  by  the  number  of  service  centers,  the  number  of  servers  at 
each  center,  their  service  time  probability  distributions  and  scheduling  policy,  a  probability 
transition  matrix,  the  number  of  jobs  in  the  network  (if  a  closed  network)  or  an  arrival  time 
probability  distribution  (if  an  open  network).  The  network  equilibrium  state  probability 
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distribution  is  computed  from  the  above  parameters  and  is  subsequently  used  to  compute  various 
performance  measures.  Performance  measures  of  general  interest  include  mean  queue  length, 
busy  probability,  and  throughput  (jobs/unit-time). 

The  "classical"  algebraic  solution  technique,  in  general,  requires  simplifying  assumptions  to 
allow  the  system  to  be  modeled  as  a  Markovian  network,  having  a  managable  solution.  The 
service  and  arrival  processes  of  jobs  at  each  service  center  are  assumed  to  be  statistically 
independent  and  identically  distributed  (iid).  The  service  process  is  assumed  to  be  exponentially 
distributed.  The  arrival  process  is  assumed  to  be  Poisson,  which  implies  the  time  interval  between 
consecutive  arrivals  has  an  exponential  distribution.  The  scheduling  policy  is  assumed  to  be  first- 
come  first-served  (FCFS).  This  allows  the  probabilistic  flow  rate  into  and  out  of  a  state  to  be 
expressed  as  a  simple  linear  function  of  time,  the  mean  service  rate,  and  the  mean  arrival  rate. 
From  the  resultant  expressions  for  the  probabalistic  flow  rates  a  set  of  simultaneous  state  balance 
equations  is  formed,  and  a  product  form  solution  is  assumed  which  eventually  reduces  to  a  set  of 
simultaneous  linear  equations.  Appendix  A  provides  a  brief  review  of  this  solution  technique. 

Jackson  [JACKSO  57]  presents  one  of  the  earliest  works  on  solving  the  equilibrium  state 
probabilities  of  an  open  queueing  network.  A  fairly  general  network  is  assumed.  It  consists  of  a 
set  of  service  centers  each  containing  an  arbitrary  number  of  servers.  Arrivals  into  the  network 
follow  a  Poisson  process  whose  service  times  are  iid  exponential  distributions,  and  service  is 
rendered  on  a  FCFS  basis.  A  solution  for  the  equilibrium  state  probability  distribution  is 
presented.  Jackson  points  out  the  similarity  between  the  form  of  this  solution  and  that  of  an 
elementary  single  service  center  with  multiple  servers,  under  the  same  arrival  and  service 
assumptions. 

Jackson  [JACKSO  63]  later  extended  this  model  by  incorporating  arrival  and  service  time 
distributions  which  are  functions  of  the  queue  length.  In  addition  he  generalized  the  upper  and 
lower  limits  on  the  number  of  customers  in  the  network,  which  were  previously  infinity  and  zero, 
respectively.  He  introduced  the  concepts  of  "triggered  arrivals"  and  "service  deletions"  to 
accomplish  this.  Triggered  arrivals  occur  when  the  total  number  of  customers  drop  below  a 
specified  threshold,  thus  triggering  immediate  arrivals  of  new  customers  to  replace  those  that  left. 
Service  deletions  occur  when  the  length  of  a  queue  exceeds  some  maximum  threshold.  New 
customers  arriving  at  this  queue  are  given  zero  service  time  and  continue  on  their  routing. 
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Gordon  and  Newell  [GORDON  67 A]  independently  produced  a  result  similiar  to  Jackson's 
[JACKSO  63]  in  that  they  derived  the  equilibrium  state  probabilities  for  a  closed  queueing 
network.  Gordon  and  Newell  represent  this  closed  job  flow  system  as  an  irreducible  Markov 
process,  consisting  of  a  constant  number  of  customers  whose  service  time  distributions  are 
exponential.  They  present  the  state  balance  (difference)  equations  for  this  system.  By  assuming 
the  solution  is  of  a  product- form  and  utilizing  a  separation  of  variables  technique,  they  obtain  a  set 
of  linear  simultaneous  equations  of  the  form  E=E[P].  Since  [P]  is  the  transition  probability 
matrix,  which  is  stochastic,  a  solution  to  the  simultaneous  equations  exists.  The  solution  to  these 
simutaneous  equations  can  then  be  substituted  back  into  the  assumed  product-form  solution  for 
the  equilibrium  state  probabilities.  The  product  form  solution  incorporates  a  normalization 
constant  whose  purpose  is  to  force  these  product  terms  to  be  proper  probabilities  that  sum  to 
unity.  Thus  the  normalization  constant  can  be  solved  for  by  summing  the  product-form  terms 
over  the  entire  state  space.  The  state  space  grows  exponentially,  0[  (K  +  l)s  ],  with  the  number  of 
jobs,  K,  and  the  number  of  service  centers,  s,  in  the  network.  Therefore,  this  presents  a  nontrivial 
computational  requirement. 

A  simple,  nontrivial  case  of  a  closed  queueing  network  model  is  that  of  the  Central  Server 
model.  Buzen's  [BUZEN  71]  work  on  the  central  server  model  is  among  the  initial  applications  of 
closed  queueing  network  theory  to  computer  systems.  The  model  consists  of  a  CPU  (the  central 
server)  and  a  set  of  peripheral  processors  (PPUs)  which  service  a  set  of  continually  circulating 
jobs,  see  Figure  III-2  of  chapter  III.  The  behavioral  characteristics  of  a  job  are  as  follows.  A  job 
requests  service  from  the  CPU.  If  the  CPU  is  busy,  the  job  must  wait  in  a  FCFS  queue.  Once  the 
CPU  service  request  is  satisfied,  the  job  then  transits  to  one  of  the  PPUs  or  back  to  the  CPU  with  a 
fixed  probability.  After  service  is  completed  at  the  PPU,  the  job  proceeds  with  probability  one  to 
the  CPU.  The  probability  transition  matrix,  [P],  for  the  central  server  model  consists  of  a  single 
nonzero  row  and  column,  see  Figure  A-2  of  Appendix  A.  Based  on  Gordon  and  Newell's 
technique  the  solution  to  the  resulting  set  of  simultaneous  equations  is  obtainable  by  inspection. 
Substituting  this  solution  into  the  product  form  yields  the  expression  for  the  equilibrium  state 
probabilities;  one  must  still  solve  for  the  normalization  constant,  which  is  a  nontrivial  task. 


12 


Buzen  introduced  an  efficient  iterative  procedure  to  solve  for  the  normalization  constant. 
Instead  of  growing  exponentially,  the  computational  complexity  of  Buzen's  algorithm  grows  as  0[ 
Ks  ],  where  K  is  the  number  of  jobs  and  s  is  the  number  of  service  centers  in  the  network.  Buzen 
further  derived  expressions  for  the  busy  probability,  throughput  and  mean  queue  length  of  each 
service  center  (CPU  and  PPUs).  Buzen  also  developed  a  similar  central  server  model  which 
allowed  the  mean  service  rates  to  be  arbitrarily  dependent  on  queue  lengths,  but  still  required 
exponential  distributions. 

Moore  [MOORE  72]  independently  had  applied  Gordon  and  Newell's  [GORDON  67]  work 
to  modeling  computer  systems.  His  approach  to  obtaining  an  efficient  solution  for  the 
normalizing  constant  was  based  on  a  partial  fraction  expansion  method.  The  complexity  of  this 
method  is  0[  Ks2  ].   Buzen's  iterative  method  is  less  complex,  0[  Ks  ],  and  also  more  versatile. 
Reiser  and  Kobayashi  [REISER  75]  have  generalized  Moore's  solution  technique  for  mixed 
networks  (i.e.  a  mix  of  both  open  and  closed  subchains)  and  removed  several  of  the  previous 
modeling  constraints.  They  provide  a  general  algorithm  based  on  multiplication  of  power  series, 
which  can  be  viewed  as  a  multi-dimensional  linear  filter.  In  an  independent  effort  Lam  [LAM 
77B]  had  extended  Moore's  solution  technique  to  include  nondistinct  traffic  intensities. 

The  concept  of  "local  balance"  was  introduced  by  Chandy  [CHANDY  72];  the  previous 
works  by  Jackson,  Gordon  and  Newell,  and  Buzen  were  based  on  "global  balance."  Local  balance 
is  a  subset  of  global  balance  in  which  one  concentrates  on  the  flow  through  a  single  queue,  rather 
than  through  all  queues.  More  specifically  it  requires  equivalent  terms  on  one  side  of  the  balance 
equation  to  equal  those  on  the  other  side,  instead  of  the  more  general  solution  to  the  equation. 

Chandy  has  also  shown  that  some  other  service  time  distributions  and  scheduling  policies 
yield  the  same  results  as  the  exponential  distribution  with  a  FCFS  scheduling  policy.  Therefore, 
only  the  mean  service  time  is  required  in  the  product-form  solution  for  these  other  distributions 
and  corresponding  scheduling  policies.  The  four  cases  that  Chandy  identifies  as  having 
equivalent  solutions  to  the  exponential  distribution  are:  (1)  exponential  service  distributions  and 
FCFS  scheduling;  (2)  geometric  service  distributions  whose  Laplace  transform  is  rational  and  a 
processor  sharing  scheduling  policy;  (3)  a  service  distribution  whose  Laplace  transform  is  rational 
and  a  processor  sharing  scheduling  policy;  and  (4)  a  service  distribution  whose  Laplace  transform 
is  rational  and  a  last-come-first-served-premptive-resume  (LCFS)  scheduling  policy.   One  way  to 
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understand  the  relationship  between  easel  (previouly  the  standard)  and  the  others  is  to  realize 
that  a  rational  Laplace  transform  inverts  to  a  sum  of  exponential  distributions  [COX  55],  in  other 
words,  a  hypo-  or  hyper-exponential  distribution.  A  hypo-exponential  distribution  is  realizable  as 
a  set  of  serial  exponential  stages,  while  a  hyper-exponential  distribution  is  realizable  as  a  set  of 
parallel  exponential  stages. 

Another  attempt  to  extend  the  range  of  probability  distributions  of  the  central  server  model  is 
due  to  Baskett  and  Gomez  [BASKET  72].  Using  an  approach  similar  to  Chandy's  [CHANDY  72] 
"case-3"  for  the  CPU,  a  service  time  distribution  with  a  rational  Laplace  transform  and  a  processor 
sharing  scheduling  policy,  they  introduced  the  coefficients  of  variation  into  their  model.  For  this 
model  they  derived  the  equilibrium  state  probabilities  of  the  network  which  are  identical  to 
Buzen's,  so  therefore,  all  of  Buzen's  results  extend  to  this  variation. 

Baskett  and  Muntz  [BASKET  73]  extended  Chandy's  [CHANDY  72]  local  balance  model  by 
incorporating  multiple  classes  of  customers.  The  allowable  classes  are  obtained  from  the  four 
cases  presented  by  Chandy.  Each  service  center  contributes  a  factor  dependent  on  its  class  to  the 
product  form  solution.  The  equilibrium  state  probability  distribution  is  presented  in  two  forms,  a 
detailed  form  which  denotes  customer  class  per  service  center  and  an  aggregate  form  of  total 
customers  per  service  center,  the  latter  exists  only  under  specific  conditions.  Due  to  the  properties 
of  local  balance  the  marginal  state  distributions  for  open  networks  are  obtained  in  closed  form. 
The  resemblance  between  these  marginal  distributions  and  those  of  single  server  systems  is 
striking.  The  marginal  distributions  are  equivalent  to  those  of  an  M/M/l  queue  (a  single  server 
queue  with  Poisson  arrival  and  exponential  service  time  distribution),  and  the  exception  (infinite 
server,  and  iid  rational  Laplace  transform)  is  equivalent  to  that  of  an  M/G/l  queue  (a  single 
server  queue  with  Poisson  arrival  and  general  service  time  distribution). 

The  more  recent  results  of  queueing  network  theory,  using  multiple  job  classes,  are  applicable 
to  our  SPR  acrhitecture.  The  primary  limitation  is  the  memory-space  complexity  closely  followed 
by  the  time  complexity  of  obtaining  solutions.  The  state-space  for  these  models  grows  in  an 
exponential  manner.  Although  efficient  algorithms  exist  for  these  models,  the  SPR  architecture 
and  many  other  systems  have  state-spaces  too  large  to  be  reasonably  evaluated  by  these 
algorithms,  especially  when  multiple  job  classes  are  involved. 
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C.  Approximations 


The  current  queueing  network  limitations  of  size  and  structure  have  provided  the  motivation 
for  approximate  modeling  techniques.  The  thrust  of  the  approximation  techniques  has  attacked 
the  structure  limitation.  This  thrust  has  occured  because  previously  most  subsystems  had  a  state- 
space  whose  size  was  reasonable  to  evaluate  using  existing  queueing  network  techniques,  but 
whose  operational  structure  was  not.  Therefore,  rather  than  applying  an  accurate  analytic  model 
to  an  inaccurate  descriptive  model,  approximate  analytic  models  were  formulated  to  approximate 
an  accurate  descriptive  model.  As  a  result  of  these  approximation  techniques  some  economy  on 
the  size  limitation  has  also  been  realized. 

These  approximation  techniques  can  be  categorized  as  decomposition  or  substitution,  neither 
precludes  the  use  of  the  other.  Decomposition  separates  the  system  into  various  pieces,  each  piece 
is  modelled  individually  to  obtain  a  constituent  model,  and  then  these  constituent  models  are 
joined  together  to  form  the  overall  system  model.  To  be  able  to  insure  some  correlation  between 
the  individual  constituent  models  and  the  overall  model,  some  common  relations  must  usually  be 
satisfied.  Although  each  individual  constituent  model,  and  usually  the  overall  system,  separately 
satisfy  these  relations  they  do  not  do  so  in  a  consistent  manner.  Therefore,  techniques  to 
coordinate  consistent  interrelations  are  required.  These  relations  are  generally  concerned  with  the 
flow  and  capacity  aspects  of  the  system. 

The  basis  for  this  decomposition  into  individual  subsystems  stems  from  the  work  done  by 
Courtois  [COURTO  71].  Courtois  investigated  the  level  of  coupling  between  queues  and 
determined  that  the  subsystem  selection  should  be  based  on  this  parameter.  Queues  that  were 
strongly  coupled  should  be  decomposed  into  subsystems  such  that  the  coupling  between 
subsystems  is  weak.  The  error  introduced  by  this  approximation  technique  is  proportional  to  the 
degree  of  coupling. 

The  other  general  approximation  technique  is  substitution.  The  substitution  technique  is 
concerned  with  substituting  one  form  (or  model)  for  another  in  an  existing  solution  method.  For 
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example,  one  might  substitute  a  M/G/l  for  a  M/M/l  queue  form  in  an  analytic  model  to 
approximate  the  operation  of  a  M/G/l  queue  in  the  descriptive  model.  The  concept  is  that  the 
analyst  feels  the  inaccuracies  introduced  as  a  result  of  this  approximation  are  worth  the 
expediency  of  utilizing  existing  methods  rather  than  formulating  a  new,  more  complex  model  or 
method. 

Much  of  the  effort  to  extend  the  structure  limitations  have  been  concerned  with  the  service 
time  probability  distribution.  An  exponential  service  time  distribution  is  the  most  prominent. 
When  an  exponential  distribution  is  assumed  in  formulating  a  model,  it  contributes  to  tractable 
solutions.  This  is  primarily  due  to  two  properties  of  the  exponential  distribution.  First,  it  may  be 
specified  by  a  single  parameter,  its  mean.  Second,  it  is  memoryless,  requiring  no  previous 
information  to  determine  its  future  operation. 

A  wide  range  of  analytic  models  are  based  on  the  exponential  service  time  distribution 
assumption,  while  the  real  systems  they  are  applied  to  do  not  possess  such  service  times.  In  many 
of  these  cases  when  the  analytic  predictions  were  compared  to  actual  measurements  a  reasonable 
agreement  was  observed.  Although  efforts  continue  to  formulate  models  based  on  more  general 
service  time  distributions,  the  robustness  of  the  exponential  distribution  should  not  be  dismissed. 
This  robustness  has  been  discussed  [BASKET  72A]  and  investigated  [GROSS  75],  and  as  a  result 
some  quantification  of  the  expected  error  is  available  for  some  approximate  substitution  modeling 
applications. 

Decomposition  techniques  have  been  applied  by  others  [BROWNE  75],  but  these  were 
basically  specific  to  a  given  model  or  situation.  Chandy  [CHANDY  75A]  introduced  the  concept 
of  an  equivalent  queue,  analogous  to  Norton's  theorem  in  electrical  networks.  This  allows  one  to 
transform  a  subsystem  of  service  centers  into  a  single  equivalent  service  center  of  a  queueing 
network.  This  resulting  composite  service  center,  referred  to  as  the  complement,  captures  the 
interface  between  a  specific  queue  and  the  rest  of  the  network.  Chandy  has  shown  that  the 
equilibrium  queue  length  and  wait  time  distributions  of  the  non-reduced  service  centers  are 
equivalent  to  those  of  the  original  network,  provided  local  balance  [CHANDY  72B]  is  satisfied  by 
the  network.  This  means  that  for  systems  that  satisfy  local  balance,  this  technique  produces  exact 
results.  For  systems  that  do  not  satisfy  local  balance  Chandy  [CHANDY  75B]  has  generalized  the 
above  decomposition  technique  by  using  an  iterative  converging  flow  balance  relation  to  "adjust" 
the  service  rate  (flow)  of  the  composite  queue. 
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The  intent  of  this  decomposition  technique  was  to  reduce  the  complexity  of  the  analysis  task. 
If  one  were  interested  in  the  analysis  of  a  specific  service  center,  then  all  other  service  centers 
could  be  represented  by  a  single  composite  queue  resulting  in  a  two  queue  system,  which  is 
generally  less  complex  to  analyze  then  the  larger  original  system.  Systems  whose  decomposition 
results  in  multi-class  composite  queues  must  also  consider  job  ordering  of  the  different  classes,  or 
sub-chains.  The  resultant  state-space  is  generally  not  decreased  and  little,  if  any,  economy  is 
realized  in  the  complexity  of  the  analysis.  In  developing  our  exact  SPR  model  in  chapter  III,  we 
have  applied  this  decomposition  technique.  Since  multiple  job  classes  are  being  dealt  with,  only 
marginal  economy  is  achieved  and  it  is  still  necessary  to  evaluate  an  extremely  large  state-space. 

Currently  the  product  form  solution  [BASKET  75]  of  queueing  network  theory  has  evolved  to 
a  structure  allowing  a  general  connectivity  of  a  variety  of  service  centers  each  of  which  may  have  a 
general  service  time  distribution  (having  a  rational  Laplace  transform),  and  may  also  have  a 
number  of  different  scheduling  policies  and  job  classes.  A  service  center  having  a  FCFS 
scheduling  policy,  however,  is  constrained  to  have  an  exponential  service  time  distribution.  In  an 
effort  to  extend  this  structure  even  further  Shum  [SHUM  77]  presented  an  "extended  product 
form"  (EPF)  solution  method.  Based  on  the  fact  that  each  product  factor  has  the  form  of  an 
M/M/l  marginal  distribution,  Shum  postulated  that  a  reasonable  approximation  would  be  to 
replace  each  factor  with  an  appropriate  M/G/l  marginal  distribution  form.  The  basic 
convolution  computations  and  performance  measures  of  product  form  queueing  network  theory 
are  still  applicable  to  obtaining  solutions  for  this  approximate  model. 

Utilizing  this  approximate  method  requires  an  additional  constraint  on  the  solution  of  the 
initial  set  of  simultaneous  equations.  Prior  to  this  approximation  the  solution  to  these 
simultaneous  equations  resulted  in  "relative"  visit  frequencies,  which  were  related  to  the  absolute 
visit  frequencies  by  a  multiplicative  constant.  This  multiplicative  constant  has  no  effect  on  the 
resultant  equilibrium  state  probabilities  or  the  performance  measures,  since  it  is  cancelled  out  of 
the  expression  for  the  product  form  solution.  Therefore,  the  relative  visit  frequencies  are 
sufficient  for  the  product  form  solution.  The  EPF  model,  on  the  other  hand,  requires  that  the 
"absolute"  visit  frequency  be  used  and,  therefore,  the  multiplicative  constant  must  be  obtained. 
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There  is  no  known  method  for  computing  the  "absolute"  visit  frequencies.  Shum  suggests 
that  balancing  a  flow  relation  containing  the  M/G/l  substitution  factors  could  be  used  to  obtain 
them.  Once  this  relation  is  formulated  a  bounded  binary  search  process  could  be  used  to  satisfy  a 
least  square  error  criterion.  A  procedure  is  presented  for  computing  the  M/G/l  product  factors 
for  a  general  distribution  with  different  coefficient  of  variation  (C).  When  compared  to  other 
models  (machine  repairman  model,  cyclic  model,  and  central  server  model)  with  similar 
corresponding  parameters,  an  error  analysis  showed  that  the  largest  error,  as  a  function  of  the 
coefficient  of  variation,  occured  in  "mid-range,"  while  exact  results  were  obtained  for  C=  1 
(exponential)  and  diminishing  errors  resulted  for  large  values  of  C.  Shum  has  indicated  that 
further  work  is  needed  to  extend  the  EPF  model  to  include  multiple  classes,  and  queue  dependent 
service  times.  The  approximation  developed  by  Shum  is  an  example  of  the  combined  use  of  both 
substitution  (M/G/l  for  M/M/l)  and  decomposition  or  flow  equivalent  techniques  (used  to 
obtain  the  absolute  visit  frequencies).  The  EPF  approximation  is  not  applicable  for  the  SPR 
architecture  principally  due  to  the  need  to  account  for  multiple  job  classes. 

An  approximate  solution  for  queueing  networks  has  been  developed  by  Kobayashi 
[KOBAYA  74A},  using  the  Kolmogorov  diffusion  equations  (also  known  as  Fokker-Planck 
equations).  By  using  a  "Central  Limit-Theorem"  argument,  Kobayashi  has  hypothesized  that 
changes  in  queue  length  over  a  large  enough  time  interval  approximates  a  stochastic  process  with  a 
normal  distribution.  As  a  result,  this  queue  length  process  can  be  modeled  by  a  Wiener-Levy 
process  (or  Brownian  motion)  with  a  suitable  boundary  condition.  Equations  for  the  equilibrium 
state  probability  distribution  for  a  queue  are  developed.  When  these  results  are  compared  to  the 
known  solution  of  a  M/M/l  queue  they  are  found  to  be  in  error.  In  an  effort  to  reduce  this  error, 
these  results  are  modified  to  conform  to  this  known  solution.  Using  a  multi-dimensional  diffusion 
equation  this  approach  is  extended  to  both  open  and  closed  networks.  Once  the  the  equilibrium 
state  probabilities  are  obtained  they  are  then  substituted  back  into  the  product  form  solution.  For 
the  SPR  architecture  we  have  already  assumed  an  exponential  service  process.  Therefore,  existing 
exact  queueing  network  theory  can  be  used  without  the  need  to  determine  the  probability 
distributions  by  using  the  diffusion  approximation. 
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Using  this  approximate  technique  the  transient  state  probability  distributions  are  obtainable. 
Kobayashi  [KOBAYA  74B]  derives  them  for  a  single-server  and  a  cyclic  queueing  system. 
Obtaining  the  transient  solution  to  general  open  and  closed  networks  is  much  more  difficult,  and 
no  method  currently  exists. 

Decomposition  or  flow  equivalent  techniques  apply  to  exact  as  well  as  approximate  models. 
In  certain  situations  these  composite  sub-models  can  significantly  reduce  the  size  of  the  overall 
system  state-space,  therefore,  allowing  large  system  models  to  be  evaluated  much  more  efficiently. 
There  are  some  problems  with  this  technique.  One  is  that  the  service  center  or  centers  that  one 
may  desire  to  investigate  must  usually  be  excluded  from  any  of  the  composite  sub-models. 
Another  is  if  any  parameters  of  the  service  centers  within  a  sub-model  changes  a  new  sub-model 
must  usually  be  evaluated  and  "re-aligned"  to  agree  with  the  overall  model  consistencies.  This  re- 
evaluation  process  may  negate  any  size  economy  that  may  otherwise  been  realized.  Although  the 
decomposition  technique  is  applicable  to  the  SPR  architecture  it  does  not  reduce  the  state-space 
size.  Also,  when  these  techniques  are  applied  to  approximate  models  the  resultant  accuracy  is 
generally  variable  and  unknown.  Chandy  [CHANDY  75]  has  indicated  a  10%  to  20%  expected 
error  for  his  approximation  technique.  Others  [SHUM  76,  KOBAYA  74A]  have  indicated  an 
error  exists,  usually  by  example  for  a  few  configurations,  but  do  not  indicate  what  one  can  expect 
for  the  general  case. 

We  have  mentioned  earlier  the  robustness  of  the  exponential  distribution.  Realizing  that  this 
assumption  does  result  in  inaccuracies  some  studies  have  been  conducted  to  investigate  them  as 
well  as  errors  caused  by  substituting  other  service  time  distributions.  Gross  [GROSS  75]  has 
investigated  the  resultant  error  when  a  M/M/l  queueing  model  is  used  to  approximate  a  M/G/l 
queue.  He  found  that  the  resulting  error  was  proportional  to  the  coefficient  of  variation.  Buzen 
[BUZEN  74]  has  investigated  the  use  of  a  M/G/l  queueing  model  to  approximate  a  M/G/l/K 
queue.  He  found  that  the  largest  error  occured  when  the  traffic  intensity  was  high  or  the  value  of 
K  was  low.  Buzen  [BUZEN  77]  also  investigated  the  use  of  a  M/M/l/K  queueing  model  (a  single 
server  queue  with  Poisson  arrival  and  exponential  service  time  distribution,  with  a  queueing  limit 
of  K  jobs)  to  approximate  a  M/G/l/K  queue  (a  single  server  queue  with  Poisson  arrival  and 
general  service  time  distribution,  with  a  queueing  limit  of  K  jobs).  His  results  indicate  that 
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response  time  performance  measures  have  a  greater  sensitivity  to  this  approximation  than  do  the 
utilization  or  throughput  performance  measures.  Also  Buzen  indicates  that  the  error  is 
proportional  to  the  coefficient  of  variation,  which  independently  verifies  similar  results  obained 
by  Gross. 

All  of  these  studies  have  helped  to  quantify  the  error  that  results  from  these  substitution 
approximations.  Therefore,  they  provide  the  analyst  with  some  quantification  of  the  error  that 
can  be  expected  when  these  substitution  approximations  are  applied.  In  a  similar  manner  we 
attempt  to  quantify  the  error  that  can  be  expected  from  the  approximate  model  presented  in 
chapter  IV. 
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III.  Shared  Central  Server  Model 
A.  The  Model 

Utilizing  the  general  theory  of  queueing  networks,  a  model  will  be  constructed  for  a  special 
class  of  computer  architecture.  The  model  (figure  III-l)  is  a  network  consisting  of  a  set  of 
independent  computing  systems  (ICS)  and  a  single  shared  processing  resource  (SPR).  Each  ICS  is 
a  central  server  system  (figure  III-2)  consisting  of  a  single  central  server  (CPU)  and  a  number  of 
peripheral  processors  (PPUs).  Each  ICS  processes  a  separate  class  of  jobs,  while  the  SPR 
processes  all  classes  of  jobs.  Class  distinction  is  the  means  by  which  the  job  flow  from  each  ICS  is 
kept  segregated.  This  model  to  be  developed  will  be  referred  to  as  the  Shared  Central  Server 
(SCS)  Model. 

The  SPR  and  the  devices  within  an  ICS  each  represent  an  intricate  subsystem,  as  do  the 
various  communication  subsystems  interconnecting  each  of  them.  The  model  does  not  directly 
incorporate  the  effects  of  various  strategies  that  may  be  utilized  within  any  of  the  subsystems,  such 
as  the  effect  of  different  communication  protocols.  Instead  these  effects  are  assumed  either  to  be 
incorporated  into  the  device  processing  time  or  to  be  insignificant  when  compared  to  the  overall 
device  processing  time. 

Specifically,  this  model  consists  of  R  ICSs  where  the  i-th  ICS  is  composed  of  Sj  devices.  For 
the  i-th  ICS,  the  CPU  is  denoted  as  device  (i,l),  and  devices  (i,2)  through  (i,Sj)  are  the  PPUs.  The 
SPR  is  designated  as  device  (i,0),  or  just  (0),  and  has  a  device  count  s0  =  1.  The  network  structure 
can  be  represented  by  a  vector  denoting  the  number  of  devices  in  each  ICS,  S  =  (sQ, ... ,  sR),  and 
the  total  number  of  devices  in  the  network  is  L=  1  s^.  The  network  is  closed,  constantly  circulating 
and  processing  a  total  of  K  jobs,  which  are  composed  of  separate  and  distinct  classes,  one  for  each 
ICS;  while  the  SPR  processes  every  class  using  a  FCFS  scheduling  policy.  A  vector  description  of 
the  job  class  allocation  is  J  =  (J1, ... ,  JR),  where  J{  is  the  maximum  number  of  (class  i)  jobs  in  the  i- 
th  ICS.  The  network  state  n  is  defined  as  the  distribution  of  the  various  classes  of  jobs  among  all 
the  devices,  n  =  (n10,  nll5 ... ,  nls  ,  n20, ... ,  nRs  ),  where  nj :  is  the  number  of  class  i  jobs  both 
waiting  and  being  processed  at  device  j  in  the  i-th  ICS.  The  network  job  constraints  are 
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Figure  1 1 1-3.    SCS  model  transition  probability  matrix. 
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(la)  nj  =    I    nij     <    J,  i>0 


i.j     —      i 


db)  N0  =  (n1>0  nR0) 


R  R         sj 

(lc)  Z   Jj    =  2        In..       =  K  _  and 

i  =  l  i=i     j=0 


I  n..  '       i>0 


(Id)  n.  =  { 


j=i 


I    nj0  -       i=0 


At  each  device  jobs  are  processed  on  a  FCFS  basis  with  an  exponential  service  time 
distribution  that  is  independent  and  identically  distributed  (iid),  and  not  dependent  on  the 
number  of  jobs  waiting  for  service.  The  j-th  device  of  the  i-th  ICS  is  characterized  by  its  mean 
processing  service  rate  \i{ .,  where  u^  0  =  uQ  for  all  i.  The  job  flow  (as  indicated  in  figures  III-l  and 
III-2)  is  from  the  i-th  CPU  to  one  of  the  devices  within  the  i-th  ICS  or  to  the  SPR  with  a  constant 
known  probability,  (Kpi  l.[  -<1 ,  where  OXj^.   Note  that  the  jobs  do  not  change  class  as  they 
transit  between  devices.  After  being  processed  by  a  PPU  or  the  SPR,  the  job  returns  to  the  i-th 
CPU  with  probability  pi  =. i  1  =  1.  All  other  transition  probabilities  are  equal  to  zero.   The 
transition  probability  matrix  for  the  SCS  model  is  shown  in  figure  III-3.  The  p4  -^  •'  s  are  the  only 
transition  probabilities  that  are  not  0  or  1  and,  therefore,  they  are  the  only  transition  probabilities 
that  need  to  be  specified  by  a  variable.  Based  on  this  we  may  shorten  the  subscript  notation  for 
these  transition  probabilities  to  p;  •.  The  SCS  transition  probability  matrix  can  be  conceptualized 
as  a  matrix  whose  diagonal  is  a  set  of  sub-matrices  and  all  non-diagonal  elements  are  zero.  Each 
sub-matrix  is  an  (Sj+ 1)  x  (Sj+ 1)  matrix,  with  a  single  non-zero  row  and  column. 
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Within  the  model  the  job  flow  is  assumed  to  occur  in  zero  time,  although  in  an  actual  system  a 
finite  amount  of  time  is  required.  The  communication  subsystem  for  these  architectures  generally 
has  sufficient  excess  bandwidth  to  prevent  it  from  becoming  a  bottleneck  [WATSON  80, 
THORTO  80].  If  the  communication  time  is  significant  compared  to  the  device  processing  time, 
then  the  device  processing  time  can  be  increased  to  account  for  it.  So,  this  modeling  assumption  is 
reasonable. 

With  this  "micro"  information  one  can  then  compute  the  relative  load  factor  [GIAMMO  76], 
x-  •,  for  each  device  by  solving  a  set  of  simultaneous  equations  as  follows: 


(2a) 


(2b) 


R         s. 

ei,j 

~    *        *    enU  pm,k;ij 

m=l    k=0 

eU 

xii 

ss 

!J 

UU 

,  and 


where  ei .  is  the  relative  visit  frequency  to  the  j-th  device  in  the  i-th  ICS  by  a  class  i  job.  For  the 
SCS  model  the  relative  visit  frequencies  can  be  found  in  terms  of  e[ ,  by  substituting  the  transition 
probabilities  of  figure  III-3  into  (2a).  Choosing  &  1  =  ui  l  results  in  the  following  (See  Appendix  A 
Section  3): 


>u={ 


uu  ,      j  =  l      (CPU) 

Pyu-i  ,      j*l      (not  CPU) 


Hence,  x^ ,  =  1  by  choice.  Using  the  relative  load  factors,  the  state  equilibrium  probabilities  can  be 
computed  by  utilizing  the  product  form  solution  [BASKET  75]  (also  see  Appendix  A,  sections  3 
and  4) 


(3)  P(n)  =  -L     fcW   n     fz(nz) 

G(J)  z=i 
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where  z  is  a  mapping  from  the  double  device  indices  (i,j)  to  a  single  index  (somewhat  analogous 
to  the  mapping  of  a  Fortran  two  dimensional  array  into  a  linear  address  space),  such  that 

0  ,      j=0    Vi 


=    {  J  ,     j>0     i  =  l 


m 
(4)  m=l 

R 
L=       2   s. 

i=0 


i-1 
j+  I    sm  ,     j>0     i>l       ,  and 


The  product  factor  is  (note,  the  subscript  order  is  reversed  from  Appendix  A,  Section  4) 


(5a)  f0(N0)    =  n0!    n 


xi,0 


ni,0 


i  =  1         \o! 


(5b)  f(n)    =  x    *  ,     z>0 


zv   z 


The  normalization  constant  is 


G(J)=  2    f0(N0)    n    fz(nz) 

n  z=l 


and  the  sum  is  over  the  enure  state  space  n,  which  is  constrained  as  specified  in  (1). 

A  computationally  efficient  method  of  computing  the  normalization  constant  was  first 
presented  by  Buzen  [BUZEN  71],  for  a  single  class  of  jobs  (see  appendix  A).  Others  have  since 
presented  a  generalization  of  this  iterative  method  for  multiple  job  classes  [MUNTZ  74, 
GIAMMO  76,  SHUM  77].  The  efficient  generalized  computation  method  requires  evaluation  of 
an  auxiliary  function.  This  auxiliary  function  has  been  established  as  an  aid  to  explicitly  present 
the  recursive  structure  of  the  normalization  constant  (Appendix  A,  sections  3  and  4).  This 
auxiliary  function,  for  our  model,  is 
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g(M;z)  =g(M;z-l)  +  x2  gCM-d^z)  l<z<L-l 


m 
vr,0 


(6)  R  R      xr-"'r 

g(M;0)  =  I    xr0  f0(M-dr)    =  |M|  !    n 

r=l  r  =  l       mr! 


where 

G(J)  =  g(J;L)      , 

g(0;z)=l  ,  l<z<L-l 

M  =  (m-^ ... ,  mR)  is  a  dummy  counting  vector  that  may  range  over  the  job  allocation 
vector ,  R  R 

3   0<m.<J.  ,  ||M||  =  n   (mj+1)  ,  and  |M|=  2   m[  , 

i  =  l  i  =  l 


0  r^i 

(1  =  0)1, ...  ,bR)  a  unit  vector,      3     br  =  \  ,   r=l, ...  ,R     , 

1  r=i 


f0(M)  =   2    xr0  f0(M-dr)  ,       with   f0(0)  =  1 

r=l 


and  z  is  the  mapping  of  the  dual  indices  into  a  single  index  as  defined  in  (4). 

To  apply  this  method  directly  would  require  retaining  all  ||M||  values  of  g(M;z)  for  at  least 
a  single  z.  One  may  conceptualize  g(M;z)  as  an  ||M||  by  z  matrix  and  therefore,  the  retention  of 
a  single  z  set  of  values  would  consist  of  a  column  containing  ||M||  elements.  The  resultant 
minimum  storage  for  g(J;z)  is 


(7)  fl    (Jj+1) 

i=l 


The  total  number  of  operations  is  on  the  order  of 
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R 
(8)  2(R  +  L-1)  n    (Jj+1) 

i  =  l 


where  L  is  the  total  number  of  devices  in  the  network  as  defined  in  (4).  For  example,  consider  a 
network  consisting  of  R  =  6,  K  =  30  allocated  as  J  =  (5,5,5,5,5,5)  and  S  =  (1,3,3,3,3,3,3);  which  is  6 
ICSs  (i.e.  6  job  classes)  each  comprising  3  devices  and  processing  30  jobs  allocated  as  a  maximum 
of  5  per  ICS.  This  example  would  require  minimum  storage  of  66  =  46,656  words  and 
approximately  2,239,488  operations.  As  the  number  of  ICSs,  R,  or  jobs,  K  and  J,  increase  the 
storage  requirements  and  operation  count  increase  exponentially,  which  can  be  seen  by  inspecting 
(7)  and  (8). 

Starting  with  the  general  solution  and  utilizing  the  specific  structure  of  the  SCS  shall  allow  us 
to  formulate  a  more  efficient  procedure  for  computing  the  normalization  constant.  The  general 
queueing  network  solution  for  the  normalization  constant  of  (3)  is 

(9a)  G(J)  =  2    f^py  fl   fz(nz) 

n  z=l 


(9b)  =  2     f0(N0)    n   fz(nz) 

L-l  2=1 

2nz  =  K 

z=0 


L-l 

(9c)  =  2    {f0(N0)  n   fz(nz)} 

L-l  z  =  l 

2nz<K 

z  =  l 

R         s- 

(9d)  =  2  {  f0(N0)   n       n    fj/ny)}. 

Sj  i=l       j=l 

2ny<J.,  Vi 


Noting  the  structure  of  the  SCS  we  can  factor  out  those  devices  which  service  only  the  R-th 
job  class,  resulting  in 
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R-l 

si 

SR 

2 

I 

n 

n 

yn.  {)  I 

2 

W  r 

si 

i=l 

j=i 

SR 

k  = 

Sn 

u^: 

[j,  VKR-1 

2: 

nR,k^JR 

(10) 

j=l  k=l 

Concentrating  on  the  inner-most  factor  of  (10),  and  substituting  (5)  yields 


nr>0 
K        Xr,0  SR 

2    [   {n0!  n  }      n   xRknR,M 

(11)  sR  r=l     nr0!  k  =  l 

SnR,k^JR 

k=l 


Rearranging  and  further  factoring  (11)  results  in 


R-l 

ni 
Xr,0 

,0 

nR,0 
XR,0 

SR 

n 

2     [    {n0! 

} 

n 

r=l 

nr,0! 

SR 
2nR,k^JR 

nR,0! 

k  =  l 

k  =  l 


R-l  "r.O  Jp  nR,0 

xr,0  R  XR,0  SR 

=  (    n   )     [2  I        {n0! }     n    xRkVk] 


r=l      nr0!  mR=0       sR  nR0!  k  =  l 

SnR,k  =  mR 
k=l 


Noting  that  nr  q  =  Jr  -  mr  further  yields 


R-l       _    nr.O  j  nR,0 

xr,0  K  XR,0 


(12)  (    n   )     [I     (  { nQ  !  }      2  n    xR  knR.k  )  ] 

r=1        nr,0!  mR  =  0  nR,0!  SR  k  =  1 

InR.k  =  mR 
k  =  l 
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The  inner-most  factor  of  (12)  is  recognized  as  a  recursive  function  of  the  form 


sR  sR-l                                           sR 

z      n  xRkYk  =  2       n  xRJ>  +  Xrsr  2       n  xR1> 

sR         k=l  sR-l        k  =  l                                   sR            k  =  l 

SnR,k  =  mR  2nR,k  =  mR                                       SnR,k  =  mR1 

k=l  k=l                                                k=l 


This  can  be  expressed  on  a  term  by  term  basis  by  a  family  of  auxiliary  functions,  one  for  each  ICS, 
in  a  form  similar  to  (6)  by  letting: 

gr(mr)  =  gr(mr;sr)  =  gr(mr;sr-l)  +  x^g^-l-.s,) 
where 


gr(0;sr)  =1       ,    and 


gr(mr;l)  =  xrlmr 


Therefore,  (12)  becomes 


R-l        _     "r.O  JR  nR,0 

xr,0  K  XR,0 


(13)  (    n   )     [1       n0! gR(mR)] 


r=l      n^!  mR=0  nR:0 


Substituting  (13)  into  (10)  results  in 


R"1      si                       xi,0    '            JR             XR,0 
G(J)  =  I  {     n       n    f4  k(n.  k)  [  2     n0  ! gR(mR)  ]  } 

si  1  =  1     k  =  1  ni,0!  mR  =  0  nR,0! 

2nij<Ji'  Vi<R-l 

j=l 


R'1      Xi,0    '         si  JR  XR.O 

=  ^{     n n    flk(nik)    [2     n0! gR(mR)  ]  } 

sj        i=l      ni0!      k=i  mR  =  0  nR0! 

2nH<J.,  Vi<R-l 

j=l 
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Repeating  the  above  partitioning  and  factoring  process  results  in  the  following 


t         v     nl,0  ,  Y        nR-l,0 

Jl         xl,0  JR-1        XR-1,0 

(14)  G(J)  =  2  { g^mj)  {  ...     {I §R-i(mR-l) 

m1=0     n10!  mR.1=0        nR.10! 

J  x     nR-° 

JR  XR,0 


Letting 


results  in 


(18) 


{  ^    n0! gR(mR)  }  }  ...  }} 


mR=0         nR0 ! 


¥    ni,0 
xi,0 


h(i;mj)    =   gi(mi) 


V 


Jl  JR-1  JR 


(15)  G(J)  =  2   h(l;mi)    [   ...   [   2    hCR-lim^)     [  2   h(R;mR)n0!]    ]  ...] 


mi=0  mR-l  =  ^  mR=0 


Jl  JR  R 


(16)  =     2       ...        2    [  n0 !  n   h(i;mi)  ] 


m-i=0  mR  =  0  i=l 


K  R 

(17)  =     2    nQ!    [    2      n    h(i;jrni0)] 


n0=0 

R          i  =  l 

2ni,0=n0 
i  =  l 

Jl 

JR                       R 

2 

ml= 

...       2    f0(J-M)     n    g-Cmj) 

0          mR=0                   i=l 

f0(M)  =    1   xi0  f0(M-di) 

i  =  l 

,  and 
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Using  the  form  of  (18)  a  recursive  relation  can  be  formulated  similar  to  (6)  as  follows 


1  R 

G(J)  =   2      ...     2     {  f0(J-M)     n    g.(m.)   } 
m->=0        mR=0  i  =  l 


J-i  Jp  K-  R 

=    2     ...      2      {  [     2     xr0  f0(J-dr-M)  ]    n    gi(mi)   } 
m-,=0         mR  =  ^  r=l  *  =  1 


R  R  J:  Jr-1  JR  R 

=  f0(0)  n  g.(j.)  +  i  xr0  {i    ...  i   ...   2   f0(j-dr-M)  n  g^) } 

i  =  l  r=l  m-,=0       m  =0       nip=0  i  =  l 


Noting  that 


Jl  V1  JR  R 


G(J-dr)     =      2      ...   I     ...    2     f0(J-dr-M)    n    gjCm.)  ,     and 


m-j=0        m  =0       mp  =  0  i  =  l 


f0(Q)  =  l      , 

yields  a  recursive  relation  for  the  normalization  constant  as 


(19)  G(J)     =     fl    gjQ)     +   I  xr0  G(J-dr), 

i=l  r=l 


where  G(0)  =  1  . 

The  above  normalization  constant  is  computed  as  the  convolution  of  the  product  factors  for 
the  SPR  and  the  auxiliary  "g"  functions  for  each  ICS  over  the  range  of  the  job  class  allocation 
vector.  This  means  that  due  to  the  structure  of  the  SCS  we  are  able  to  represent  each  ICS  by  a 
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single  composite  "g"  function  rather  than  by  a  set  of  product  factors,  one  for  each  device.  The 
"g"  functions  represent  the  convolution  of  the  product  factors  for  each  device  in  an  ICS,  and  can 
be  computed  efficiently. 

The  state  equilibrium  probability  is  obtained  by  solving  for  G(J),  by  using  any  of  (15) 
through  (19),  which  can  then  be  substituted  into  (3)  to  yield 


R         s- 

(20)  P(n)       =  _±_    f0(N0)    n       n     f  ou 

G(J)  i=l    j=l 


B.  Performance  Measures 

Having  established  the  basic  relationships  to  compute  the  state  probabilities,  we  shall  now 
utilize  them  to  form  relationships  for  some  performance  measures.  We  shall  temporarily  exclude 
the  SPR  from  the  following.  The  threshold  equilibrium  queue  length  probability  distribution, 
which  is  the  marginal  probability  that  device  (i,j)  is  serving  k  or  more  jobs,  is 


P[nij>k]      =5^P(n) 
(21) 


n    3  n->k 


=       1 


R         s 
f0(N0)    n       fl     fa(nr>t) 


G(J)        n    3  nH>k  r=l     t=l 

n  .  k 

From  (5)  the  expression  for   frt(nrf)   is  xt  r-1  and,  therefore,  when  n-->k  a  factor  of  x-:  can 

I , L         1,1  1,1  1 J  1 J 

be  extracted  from  (21).  This  extraction  changes  the  job  class  allocation  vector  over  which  sums  are 
taken,  from  J=(J1,  ...  ,  JR)  to  J'  =  (J1,  ...  ,  Jj-k,  ...  ,  JR)  and  also  causes  a  corresponding  change 
in  the  state-space  from  n  to  n\  Applying  these  transformations  to  (21)  results  in 

R  sr 

P[ny>k]         =  _1_    x,/    2      f0(N0)    n       n     frt(nrt) 
G(J)  n'  r=l       t=l 
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Noting  the  similarity  of  the  summation  portion  of  the  above  expression  to  (9)  results  in 


P[nij>k]  =  G(J')  X./ 

G(J) 

(22) 

=     Gd-kdj)      Xj,  ,    for  ij*0 

G(J) 


If  the  marginal  probability  of  device  (i,j)  is  desired,  it  may  be  expressed  as 


P[ny  =  k]     =P[nij>k]   -   P[nu>k  +  1] 

=    _i_    [xj/Gd-kdi)-   x.k+1  G(J-(k+l)di)] 
(23)  G(J) 


XU 


k 

[  GCJ-kdj)-  xy  G(J-(k+l)d.)]         ,   for  i,j*0 


G(J) 


Of  more  interest  than  these  probabilities  are  the  performance  measures  of  device  (i,j),  such  as 

the  busy  probability,  A-  •,  the  mean  queue  length,  Q.  =,  and  average  throughput,  T-  •.   The  device 

*o  *»j  m 

busy  probability  is  obtained  from  the  threshold  marginal  probability  by  noting  that 


(24)  A..    =P[ni>l]     =     Gd-djJL    *n         ,    for  i,j*0 

G(J) 


The  mean  queue  length  of  device  (i,j)  is  by  definition 


Ji 
Qu    =  2     k  P[n.j  =  k] 

k  =  l 


=  2     k  {  P[nio>k]   -   P[ny£k+1]  } 


k  =  l 
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Ji  Ji 


2     k  P[nH>k]     -    2     k  P[nH>k+l] 

k=l  k=l 


h  h 


=   2     kP[n  >k]    -    {2    (k-l)P[n  >k]    +  JiP[ni>J.  +  l]   } 


k=l  k=2 


Noting  that  P[nj  :>Jj]  =0  results  in 


J  J-  J- 

1  1  1 


Qij     =  Plny^l]     +  2:     k  P[ny>k]    -{2k  Ptn-^k]   -  2      P[ny>k]  } 

k  =  2  k=2  k=2 

(25) 


=   2      Ptnjj^k]  ,      for  i,j*0 

k  =  l 


Substituting  (22)  into  (25)  yields 


(26)  Qy     =         1         2      x^CKJ-kdj)         ,    for  ij*0 

G(J)      k=i 


The  device  throughput  when  the  service  rate  is  independent  of  the  queue  size  is  defined  as 


(27) 


TiJ     =    2     "u^Kr*] 

k  =  l 


=  «y     2     PKJ  =  k] 

k  =  l 


=  Uy   {1-   P[n^0]   } 


=  u>j    \j 
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=     GO-dj)      e.j  ,       for  i,j*0    . 

G(J) 

The  i-th  ICS  throughput  for  service  rates  independent  of  queue  size  is  defined  to  be  that  of  its 
CPU,  which  is 

(28)  T.    =     Gg-djLe-! 

G(J) 

It  should  be  noted  that  although  this  measure  is  referred  to  as  throughput,  it  may  more  properly 
be  thought  of  as  effective  processing  rate  or  departure  rate,  as  can  be  seen  from  its  definition  in 
(27). 

Similar  measures  for  the  SPR  will  now  be  derived.  The  SPR  busy  probability  is 

(29)  A0    =P[n0>l]      =1-   P[n0  =  0] 


From  (1),  it  can  be  seen  that  when  nQ  =  0,  ni0=0  and   I   ni  -  =  Ji   for  i  =  l, ... ,  R.   From  (5), 
when  nQ  =  0  it  can  be  seen  that  fQ(0)  =  1.   Substituting  this  into  (20)  yields 

R        s. 

P[nQ  =  0]  =     _J_  2  I!  n      fydly) 

G(J)    s.         i=l     j=l 
2  ny  =Jj  ,  Vi 

J  =  l 


Repeating  the  same  partitioning  and  factoring  process  used  to  obtain  (14)  results  in 


R 
P[n0  =  0]       =  _1_      n    g.(J.) 
G(J)      i  =  l 


Substituting  this  into  (29)  yields 
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R 
(30)  A0     =  P[n0>l]       =  1   -    n  ^U,! 

i  =  l      G(J) 


To  obtain  the  mean  queue  length  of  the  SPR  its  aggregate  marginal  probability,  independent 
of  class,  must  first  be  computed.  This  may  be  expressed  as 

P[nQ  =  k]    =  2     P(n) 

n  R 

3      2ni0=k 

i=l 


R         s- 

=       -J-  Z  f0(N0)         "  n         fj/lly) 

G(J)         n        R  i=l      j=l 

3     Sni0=k 

i  =  l 

R       xrQnr,0  R         s. 

=  _L_    2    {  [  k!  n ]   [  n     n    xy\i  ]   } 

G(J)        n        R  r=l      nrQ!  i  =  l      j=l 

3     EniQ=k 
i  =  l 


R  Xr>0nr,0 


=  _kj_     i        n   {  [ ]    n    xrinrj  } 


G(J)       n      R    r=l  nr0!  j=l 

i  =  l 


rj 


3     Sni0  =  k 


R       X  A0  s 

K       Ar,0  sr 

=     k!       i      [   s        n  {   n    xrjnrj }  ] 

G(J)         R  n-N0  r=l       nrQ  !  j  =  l 

Sn.0=k 
i  =  l 


*      x,onr'° 

-  _y_  {  n        _    [  i       n    xrjnrj  ]  } 


G(J)        R  r=i       nr0!  sr  j  =  l 

Sn,,0  =  k  Inrj  =  Jr"nr,( 

i  =  l  j  =  l 


38 


Substituting,  into  the  above,  as  in  (13)  yields  an  expression   for  the  aggregate  marginal 
probability  for  the  SPR  as 


R       x  A.O 


(31)  P[n0  =  k]         =  _k_L_       2       [    n    8r(Vnro)  1 

G(J)         R  r=l        nr0! 


Sni0=k 
i=l 


R 
=  _kj_       2  n        hO^-n^) 

G(J)  R  r=l 

2n.0  =  k 

i=l 


Forming  the  defining  equation  for  the  SPR  mean  queue  length  and  then  substituting  (31) 
results  in 


Q0       =      2    kP[n0=k] 

k  =  l 


(32) 


=     2    k   {       k!        2         n         h(r;Jr-nr0)   } 

k  =  l  G(J)        R  r  =  l 

2ni0  =  k 

i=l  3    ni>0<J. 


Also  of  interest  is  the  mean  queue  length  by  job  class.  The  SPR  is  the  only  device  that 
processes  multiple  job  classes  and  is  the  only  device  where  this  performance  measure  differs  from 
the  aggregate  mean  queue  length.   The  equation  for  the  SPR  class  marginal  probability,  the 
probability  of  k  class  r  jobs  at  the  SPR,  is 

P[nrQ  =  k]  =  2     P(n)  ,  for    0<k<Jr 


3    nr0  =  k 
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Jl  k  JR 


(33)  =J_       2      ...       2       ...     2     {iiq!    n    h(i;Jrni0)  } 

G(J)       "10=°         nr,0=k         nR,0=0  i  =  1 

From  this  the  mean  queue  length  for  a  class  r  job  is  defined  as 


Jr 

Qr,o       =      *    kP[nr0=k] 

k=l 


jr  Ji  k  Jr  r 

(34)  =    2    k   [_JL      2      ...       2       ...     2     {n0!    n    hfcJj-n^)  }  ] 

k=l        G(J)       n1Q=0  nr0=k         nR(o=°  i  =  1 


Jl  Jr  JR 


=      1 


2      ...       2       ...     2     {n0!    nr>0  n    h(i;Jrn.0)  }. 


G(J)       n1Q=0         nrQ=l         nRo=0  i=l 

Forming  the  defining  equation  for  the  throughput  of  the  SPR  and  then  substituting  (30) 
results  in 


T0    =  uoA0 


R 

(35)  n  g.(j.) 

=  u0{l  •- } 

G(J) 


We  have  derived  the  standard  queueing  network  probabilities  and  performance  measures 
for  the  SCS  model,  which  are  recapitulated  here: 
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The  normalization  constant 


(36)  G(J)      =      2         ...       Z     (foCNo)    n    gA'^o)}         'where 


(37) 


Jl                        JR 

G(J) 

=      2         ...        I     " 

nl,0  =  0               nR,0=0 

WNo) 

=  2   x.  0  foflvcy 

i  =  l 

i  =  l 


The  device  busy  probability 


(38)  Ay  =     GfJ-djX,  xy      ,   and 

G(J) 


R 

n  tfiO 

(39)  1.1 

A0       =      1-    


G(J) 
The  mean  queue  length 


Ji 

(40)  Qy    =        1         2     Xj/Gd-kdj)        ,    for  ij*0 

G(J)      k=i 


K  R 

(41)  Q0      =      1         2.     k !   k   {        2!         n        h(r;Jr-nr  0 )  },   and 

G(J)      k=l  R  r=l 

Xni0  =  k 
i  =  l 


Jl  Jr  JR  R 

(42)  Qr0     =^_       2.      ...       2       ...     2      {n0!nr0n     h(i;Jrnl0)  } 

G(J)       n10  =  0         nr,o  =  1  nR,(r0  i  =  1 
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The  device  throughput 
(43)  T.,     =  ujj   A{.       =     Gd-dj)      ft:  ,        for  i,j*0         ,   and 


n   gA) 

(44)  i=i 

T0    =  uoA0        =  uo   L  1  '    J 

G(J) 


The  system  (class)  throughput 


(45)  T,     =      CKJ-dj)    en 

G(J) 


C.   Computational  Algorithms 

As  discussed  previously  the  general  iterative  procedures  for  computation  of  the  normalization 
constant  and  other  performance  measures  require  substantial  memory  space.  In  an  effort  to 
reduce  the  memory  required  without  significantly  increasing  the  computations  we  have 
reformulated  the  network  expressions  utilizing  the  structure  of  the  SCS  model.  In  addition,  we 
were  able  to  derive  expressions  for  mean  queue  length,  which  in  general  are  not  available  in  the 
literature.  Current  iterative  algorithms  to  evaluate  our  expressions  require  processing  time  and 
memory-space  that  grows  exponentially.  We  will  present  algorithms  to  evaluate  our  expressions 
which  use  a  minimal  amount  of  memory  and  require  the  same  order  of  processing  time  as  do  the 
existing  iterative  procedures. 

Examining  the  expressions  for  our  performance  measures  it  can  be  seen  that  the  computations 
are  all  very  similar,  requiring  the  sum  over  a  restricted  state  space.  Two  forms  of  this  computation 
are  represented  by  (15)  and  (17),  which  we  shall  call  the  sum-of-products  (SOP)  expansion  and  the 
factorial  (FAC)  expansion,  respectively.  Each  form  has  its  advantages  and  disadvantages. 
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The  SOP  expansion  minimizes  the  number  of  multiplications,  but  places  a  burden  on  the  factorial 
computation  since  the  value  is  not  monotonically  changing.    The  FAC  expansion  simplifies  the 
factorial  computation,  but  requires  generation  of  all  states  in  a  restricted  sub-space.  We  will 
present  efficient  algorithms  for  both  evaluation  forms  and  for  the  generation  of  a  restricted 
sub-space. 

The  SOP  algorithm  requires  an  efficient  method  to  evaluate  the  factorial,  nQ!,  in  the  inner- 
most product  term  of  (15).  This  value  is  a  function  of  all  the  indices  and,  therefore,  is  continually 
changing  during  the  evaluation  of  (15).  If  the  value  of  nQ  were  monotonically  increasing  then  an 
efficient  method  to  compute  the  next  factorial  value  nQ,  based  on  its  previous  value,  is  the  well 
known  recursion 

n0!  =  (n0-l)!n0 

In  our  case  the  value  of  nQ  varies  in  a  cycle  which  first  monotonically  increases  and  then  abruptly 
decreases.  This  decrease  occurs  at  well  defined  points;  when  any  product  term  "sum-limit"  is 
reached.  By  keeping  track  of  the  last  value  to  be  factorialized  and  its  factorial  value  for  each 
product  term,  the  above  efficient  method  may  still  be  applied. 

The  SOP  expansion  is  of  the  following  form  (note:  m— Jj-ni0): 

Ji  JR-1  JR 

G(J)=    2    hd^-n^)    [   ...    [   2    h(R-l;JR.];nR.1>0)     [l    h(R;JR-nR0)n0 !  ]   ]...]. 


n1>0=0  nR-l,0=0  nR,0  =  0 


Defining  three  temporary  vectors  as  follows: 

T=(tj_,   ...  ,  tR)   is  the  accumulated  sum  of  job  class  distribution,  where 

r 

tj   =  2    ni0      ,   note:   tR  =nQ 

i  =  l 
W  =  (Wp  ...  ,  wR)    is  the  accumulated  factorial  values,  where 

wf  =  tr !  ,   note:   wR  =nQ  ! 

V  =  (v1,  ...  ,  vR)    is  the  accumulated  summation  value  of  the  product  terms,  where 
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Ji  JR-1  JR 

v.  =  2   h(i;Jrn.0)    [  ...   [2    h(R-l;JR4-nR.10)    [  2    h(R;JR-nR0)n0 1]   ]  ...] 

n1Q  =  0  nR-l,0  =  0  nR,0  =  0 

Thus  Vj  =  G( J),  and  vR  is  the  innermost  product  term.  The  SOP  algorithm  proceeds  as  follows: 

BEGIN:  SOP  algorithm 

STEP  1:  initialize  1-st  vector  elt. 
i=l 

ti  =  0 
w.=0 

vi=0 

STEP  2:  compute  remaining  vector  elts. 
STEP  BY  1  j  =  i+l  TO  R 
BEGIN 

nj0=0 

w-w 
vJ  =  0 
END 


STEP  3:  compute  inner-most  product  term 
STEP  BT 
BEGIN 


STEP  BY  1   nR0  =  0  TO  JR 


VR  =  VR  +  WR*h(R'JR-nR,o) 


END 
i  =  R 


tR  =  tR  +  1 
WR  =  WR*tR 


STEP  4:  expand  outward  accumulating  product  term  sums 
i=i-l 
IF   i<l   GO  TO  STEP  6 

vi  =  vi  +  v,  +  l*h(i'Ji-ni,0) 
ni>0  =  ni>0  +  1 

STEP  5:  update  factorial  if  "sum-limit"  not  reached 
IF   ni0>Jj  GO  TO  STEP  4 
t^  +  1 

GO  TO  STEP  2 


STEP  6:  terminate  algorithm 
STOP 
END:  SOP  algorithm 
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The  SOP  algorithm  requires  storage  for  three  temporary  vectors  (T,  V,  W),  each  containing  R 
elements,  and  for  the  R  vectors  of  h(i,Ji ),  each  containing  J.+l  elements.  Therefore,  the  total 
storage  required  for  this  algorithm  is 


3R  +  2    (J.  +  l)  =  4R  +  1    J.  =  4R  +  K 

i  =  l  i  =  l 

To  determine  the  number  of  operations  needed  to  evaluate  the  SOP  equation  form,  note  that 
one  addition  and  one  multiplication  are  required  for  each  combination  of  the  first  (outer-most)  R- 
1  product  terms.  For  each  of  these  combinations  the  entire  inner-most  (R-th)  product  term  and 
the  factorial  must  be  evaluated,  requiring  two  multiplications  and  one  addition  at  each  step.  This 
results  in  an  operation  count  on  the  order  of 

R-l  R-l 

[n     (J.  +  l)]  [2  +  3(JR  +  l)]    =[5  +  3JR][n    (Jj+1)] 

i=l  i=l 

The  FAC  expansion  requires  an  additional  algorithm  to  sum  over  every  state  in  a  constrained 
sub-space,  which  yields  all  combinations  of  different  job  classes  keeping  the  total  number  of  jobs 
constant.  The  sub-space  is  defined  by  all  solutions  to 


2     ni>0  =  k 

i=l 


with  the  constraint  of 


n.  0   <   J.         ,  V  i 


To  sum  over  the  constrained  sub-space  start  with  class  1  jobs.  Next  from  a  total  of  k  jobs 
determine  the  maximum  and  minimum  number  of  jobs  that  can  be  allocated  to  classes  2  thru  R 
in  conjunction  with  the  constriants,  O^n^Jj.  Then  determine  the  allowable  range  of  class  1  jobs 
based  on  the  maximum  and  minimum  values  just  computed.  Stepping  through  the  range  of  class 
1  jobs,  determine  the  maximum  and  minimum  number  of  jobs  that  can  be  allocated  to  classes  3 
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through  R  from  the  remaining  k-n10  jobs.  Then  compute  the  allowable  range  of  class  2  jobs. 
Continuing  this  procedure  for  each  job  class  results  in  the  following  state-space  generation  process 

MIN[J1  .mj  MIN[JR  ,mR] 

2  ...  I 

n10=MAX[0,qi]       '*  nRO=MAX[0,qR) 


where  rrij    represents  the  number  of  jobs  to  be  distributed  over  queues  i  thru  R,  and  is 
expressed  as 


m^-n^  ,      R>i>l 

k  ,      i  =  l 


and  qi  represents  the  minimum  number  of  jobs  that  must  be  placed  in  the  i-th  queue  (which 
may  be  negative),  and  is  expressed  as 


"rt+i  ■    i<R 

mt  ,     i=R 


Utilizing  the  above,  the  complete  FAC  expansion  can  be  formulated  as 


K  MIN[J1  .mj  MINIJj^.j  .m^] 

2    k!     [      2     ha-Ji-Uio)      •"     Z    h(R-^JR-l-nR-10)h(R;JR-nR-10  +  mR-l)] 

k  =  0  n10  =  MAX[0,q1]  nR.10  =  MAX[0,qR.1] 


Defining  five  temporary   vectors  as  follows: 

M  =  (ml5  ...  ,mR)   the  maximum  set  of  jobs  to  be  distributed  over  queues  i  to  R,  where 

mi-i"ni-i  '       R^J  yl 

m{={ 

k  ,      i  =  l 
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Q  =  (q,,  ...  ,  qR)    is  the  minimum  number  of  jobs  that  the  i-th  queue  can  accept,  where 


mfti  +  1  ,     i<R 


rt^  ,     i  =  R 

T= (tp  ...  ,  tR)  is  the  maximum  number  of  jobs  that  may  be  allocated  to  queues  r  to  R, 
where 

r 

tj.  =  2     Jj       ,     note:  tx  =K, 

i=R 


V  =  (v1,  ...  ,  vR)    is  the  accumulated  summation  of  the  product  terms,  where 


.={ 


K  R 

I   k!    [    2       n    h(i;J.-ni0)]  ,    i  =  l 

k=0  R         i=l 

Sni0=k 

i  =  l 


MINIJj.!  .rnj.J  MIN[JR,mR] 

I    hCi-l^-n^o)      ...       2   h(R;JR-nR0)    ,    i>l, 

ni.I0=MAX[O,qi.1]  nRO=MAX[0,qR) 


UP= (up^,  ...  ,  upj^)   is  the  maximun  number  of  jobs  that  can  be  allocated  to  the  i-th 
queue,  where 

upj  =  MINIJj.mj] 
The  FAC  algorithm  proceeds  as  follows: 


BEGIN 

:  FAC  algorithm 
STEP  1:  initialize 
fac=l 

lR  =  JR 
vR  =  0 

k  =  -l 

STEP  BY 

■1    i  =  R-l 

TO 

1 

BEGIN 

V: 

1 

=  0 

V 

=  ti  +  i  +  Ji 

END 
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STEP  2:  compute  1-st  elt  of  temporary  distribution  vectors 
k  =  k+l 

IF   k>tx    GO  TO  STEP  8 
m,=k 
qi=mrt2 
n10=  MAX[0,qJ 
up-^  MINp^mj 
i=2 


STEP  3:  compute  remaining  vector  elements 
STEP  BY  1    r=i    TO    R-l 
BEGIN 

mr  =  mr-fnr-l 

Clr  =  mr-tr+l 
nr0=  MAX[0,qr] 

up>  MIN[Jr,mr] 

END 


STEP  4:  compute  inner-most  product  term 

STEP  BY  1      nR.10=nR.10     TO    up^ 
BEGIN 

nR,0  =  mR-l"nR-l,0 

vR=vR+h(R-l;JR.1-nR.10)*h(R;JR-nR0) 

END 
i=R 

STEP  5:  expand  outward  accumulating  product  term  sums 
i  =  i-l 
IF    i<2    GO  TO  STEP  7 

vi  =  vi  +  vi  +  l*h(i-1,'Ji-l"ni-l)o) 

vi+i=0 

ni-l,0  =  ni-l,0  +  1 

STEP  6:  test  if  "sum-limit"  reached 

IF    n^up^    GO  TO  STEP  3 
GO  TO  STEP  5 

STEP  7:  accumulate  outer-sum  term  &  update  factorial 
v-,=  Vj  +  v2*fac 
fac  =  (k+l)*fac 
GO  TO  STEP  2 


STEP  8:  terminate  algorithm 
STOP 


END:  FAC  algorithm 
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Solution 
Method 


Storage 
Requirements 


Computation 
Requirements 


1 


SOP 


FAC 


SCS 

Iterative 


4R  +  K 


R-l 


[5  +  3JJ  [  n    (Jj+1)] 


i=l 


6R+K 


2  [  n    (J|+l)]  +3K 

i  =  l 


R 

n  (Jj+i) 

i  =  l 


4R[     n        (J:  +  l)] 


i=l 


General 
Iterative 


R 

n  (Jj+1) 


2(R+L-1)[  n    (Jj+1)] 

i  =  l 


Figure  III-4.   Storage  and  computation  complexity. 
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Solution 
Method 


I 


Storage 
Requirements 


Computation 
Requirements 


SOP 

example  1 
example  2 


18 
54 


120 
155,520 


!    FAC 

example  1 
example  2 


22 
66 


102 
93,402 


r 


scs 

Iterative 
example  1 
example  2 


36 
46,656 


288 
1,119,744 


General 
Iterative 
example  1 
example  2 


36 
46,656 


Example  1: 

R  =  2 
S  =  (1,3,3) 
K  =  10 
J  =  (5,5) 


Example  2: 

R  =  6 

S  =  (1,3,3,3,3.3,3) 

K  =  30 

J  =  (5,5,5,5,5,5) 


Figure  III-5.   Example  of  storage  and  computation  complexity. 
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The  FAC  algorithm  requires  storage  for  five  temporary  vectors  (M,  Q,  T,  V,  UP),  each 
containing  R  elements,  and  for  the  R  h(i;Jj)  vectors,  each  containing  Jj  +  1  elements.  Therefore, 
the  total  storage  required  for  this  algorithm  is 


R  R 

5R+I    (J.  +  l)    =6R+I    Ji=6R+K 
i=l  i=l 


The  total  number  of  states  in  the  state  space  is 

R 

n     (Jj+l) 

i=l 

The  number  of  steps  carried  out  for  the  inner  product  terms  is  equal  to  the  total  number  of 
states.  Each  step  (a  value  of  k)  requires  one  addition  and  one  multiplication.  The  outer  term  for 
each  step  requires  two  multiplications  and  one  addition.  This  results  in  an  operation  count  on 
the  order  of 


R  R  R 

[211    (J.-hl)]  +[3  2  Jj     =[2n    (J.+l)]  +  3K 
i=l  i=l  i=l 


Figure  III-4  provides  a  summary  of  the  storage  and  computational  requirements  of  the  SOP 
and  FAC  algorithms,  as  well  as  those  for  the  general  iterative  procedure  for  multi-class  queueing 
networks  and  for  that  procedure  adapted  to  the  SCS  model.  The  general  iterative  procedure  is 
adapted  to  the  SCS  model  (SCS  iterative)  by  representing  each  ICS  as  a  single  equivalent  device 
[CHANDY  75B,  GIAMMO  76],  therefore,  the  equivalent  number  of  devices  L-l  now  becomes  R. 
Note  that  the  storage  requirements  for  the  SOP  and  FAC  algorithms  increase  linearly  with  the 
number  and  distribution  of  jobs,  K  and  J,  and  ICSs,  R;  while  the  storage  requirements  of  previous 
algorithms  increase  exponentially.  In  figure  III-5  these  requirements  are  evaluated  for  two 
examples',  the  first  is  a  small  network  of  2  computer  systems  (or  job  classes)  with  a  total  of  7 
devices  and  with  10  jobs  equally  allocated  between  the  2  systems;  the  second  is  a  moderate 
network  comprising  6  computer  systems  (or  job  classes)  with  a  total  of  19  devices  and  with  30  jobs 
equally  allocated  among  the  6  systems. 
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It  can  be  seen  that  the  SOP  and  FAC  algorithms  require  very  little  storage  compared  to  both 
iterative  algorithms,  while  also  requiring  fewer  computations.  In  addition,  the  mean  queue 
length  for  all  devices  in  the  SCS  model  can  be  computed  using  either  of  the  algorithms;  whereas, 
in  the  general  case  no  effective  procedure  yet  exists.  Although  it  should  be  noted  that  by  using  the 
iterative  procedures  relatively  little  additional  computation  is  needed  to  obtain  the  device  busy 
probability  and  the  mean  queue  length  for  all  but  the  SPR.  Computing  these  measures  using  the 
SOP  or  FAC  algorithms  entails  a  larger  amount  of  computation,  but  also  includes  evaluation  of 
performance  measures  for  the  SPR.  Comparing  the  SOP  and  FAC  algorithms  one  can  see  from 
figures  III-4  and  III-5  that  the  SOP  algorithm  uses  less  storage  while  the  FAC  algorithm  requires 
fewer  computational  steps  (i.e.  less  time). 

To  complete  this  discussion  we  shall  present  our  expressions  for  the  performance  measures  of 
the  SCS  model,  equations  (36)  through  (45),  restructured  into  forms  readily  evaluated  by  the  FAC 
or  SOP  algorithms. 

Normalization  constant 


Ji  JR-1  JR 

(46)       G(J)  =  2   h(l;j1-n10)  [  ...  [  2    h(R-l;JR.fnR.10)  [  2   h(R;jR-nR0)n0 !  ]  ]  ...]  ,  or 

n1Q  =  0  nR-l,0=0  nR,0  =  0 


K  MIN[Jj  ,m{] 

(47)  =     2    n0!  [    2      h(i;Jrni0)  F^nQ-n^)]        ,    for  any   i    , 

n0=0  ni0  =  MAX[0,qi] 

where, 


R 
Fi(no"ni,o)    =     2        n     kfrV^o)  '     and 

R  r  =  l  ,*i 

Sni,0  =  n0"nr,0 
r=l  ,*i 


R 

Qi    =    n0  -    2    Jr 
r=l  ,  *i 
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Device  busy  probability 


Jl  V1  JR-l 


(48)  Ay      =    jc.:_     I   h(l;jrn10)[...[2    h(i-l;Ji-l-n.0)...  [  I    hCR-lu^-n^,,) 

G(J)     nli0  =  0  ni0  =  0  nR-l,0  =  ° 

JR 

[2   h(R;jR-nR0)n0!]  ]  ...  ]  ...]  ,    or 

nR,0  =  0 


K-l  MINfJj  -Lnij] 

(49)  Ay  =  JL^     2    n0 !     [    2      h(i;Jrl-n.  0)  F^-iiy,)  ]  , 

i0  =  0  ni0  =  MAX[0>qi] 


R 
(50)  A0     =  P[n0>l]      =  1   -     n  _^Lli± 

i=l     G(J) 


Mean  queue  length 


Ji  Ji  Ji-i  "k 


(51)       Qy      =    _J_    2     xj     [lh(l;jrn10)[...[l    hO-l-j^-n^) 

G(J)      k  =  l  n10  =  0  ni-l,0  =  0 


JR-1  JR 


[I    h(R-l;jR.rnR.10)  [2    h^-n^no!  ]  ]  ...  ]  ...]  ,  or 


nR-l,0=0  nR.0=0 


Jj  K-k  MIN[Jj  -k.no] 

(52)       Qy     =      _1_     2     x./   [   2    n0 !  {     I      h(i;J.-k-n.  0)  F-Oyn^)  }  ]     ,  and 

G(J)       k=l  nQ  =  0  ni0=MAX[0,qi] 


Jl  JR-1  JR 


(53)        Q0  =  _J_2    h(l;Jl-n10)  [...[l    hCR-ljJ^-n^o)  [ll   h(R;jR-nR0)no  !nQ]] ...], 

G(J)     n10  =  0  nR-l,0  =  0  nR.0  =  0 
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K  MIN[Jt  ,m.] 

(54)  Q0       =      1  2    n0!    [  n0   I    h(i;Jfni0)  P(n0-ni0)]  ,   and 

G(J)        nQ  =  l  nij0=MAX[0,qi]' 


K  MIN[Jt  .nij] 

(55)  Qi0       =  _L_      2    n0 !    [    2   n-0  h(i;Jfni0)  Ffc^ ] 

G(J)        nQ  =  l  ni0=MAX[0>qi] 


The  device  throughput 

(56)  Tn     =  Uij   An       =     GfJ-d^    fc,  ,        for  ij^O         ,  and 

G(J) 


R 

n   gi(jj) 

(57)  i  =  l 


T0    -  uoA0        -   u0 


-0{i } 

G(J) 


These  expressions  can  be  computed  simultaneously  in  groups,  equations  (46)  and  (53),  or  (47) 
and  (54)  comprise  one  group,  and  (49),  (52)  and  (55)  another.  Once  the  values  for  these 
performance  measures  are  computed  they  can  then  be  applied  to  directly  evaluate  the  remaining 
equations,  (50),  (56),  and  (57).  The  FAC  and  SOP  algorithms  can  be  modified  to  compute  each 
group  at  the  same  time;  this  is  especially  useful  for  the  later  group  which  can  share  intermediate 
values  ( e.g.  Fj(nQ-nj  q)  )  and,  therefore,  eliminate  duplicate  computations.   A  Fortran 
implementation  of  these  algorithms  was  developed  and  is  used  later  in  this  dissertation  to  compute 
values  for  these  performance  measures. 
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IV.   APPROXIMATE  SCS  MODEL 


A.   The  Approximation 


Efficient  algorithms  for  queueing  networks  have  been  previously  developed  [BUZEN  73, 
MUNTZ  74,  SHUM  76],  and  a  new  algorithm  that  is  very  efficient  in  its  memory  space 
requirements  has  been  presented  here  in  chapter  III.  Still,  it  can  be  seen  from  Figure  III-4,  that 
the  computation  time  is  a  significant  burden;  it  is  of  exponential  complexity  and,  therefore, 
computationally  intractable.  In  addition,  the  complex  form  of  the  equations  conveys  little  useful 
intuitive  information  or  discemable  insight. 

Some  previous  efforts  have  concentrated  on  developing  approximate  solutions  for  various 
models.  Reducing  the  computation  and  memory-space  complexity,  or  generalizing  the  modeling 
assumptions  are  the  primary  motivations.  These  generalizations  include  more  general  service  time 
distributions,  accounting  for  passive  resources,  simultaneous  acquisition  of  multiple  resources, 
resource  blocking,  priority  and  other  scheduling  policies,  state  dependent  routing,  and  others 
[CHANDY  78]. 

Kobayashi  [KOBAYA  74A]  has  utilized  the  diffusion  approximation  to  model  queueing 
networks  with  general  service  time  distributions,  assuming  a  Poission  arrival  process  and  a  FCFS 
scheduling  policy.  This  approach  has  the  potential  to  investigate  the  network  transient  state 
behavior  [KOBAYA  74B].   The  diffusion  approximation  is  primarily  applicable  to  open  networks 
and  currently  has  limited  utility  for  a  closed  network.  Chandy  [CHANDY  75A]  has  introduced  an 
aggregation  technique  similar  to  Norton's  theorem  in  electrical  circuits.  This  technique  allows  one 
to  represent  a  number  of  queues  as  a  single  equivalent  queue.  If  the  queue  satisfies  local  balance 
[CHANDY  72B],  then  the  technique  yields  exact  solutions;  if  not,  a  similar  technique  with  an 
additional  flow  approximation  procedure  yields  approximate  solutions  [CHANDY  75B].  These 
techniques  are  of  computation  time  and  memory-space  complexity  equivalent  to  those  of  the 
convolution  algorithm  of  basic  queueing  network  theory  [BUZEN  73,  MUNTZ  74,  SHUM  76]. 
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Other  approximation  efforts  have  studied  the  effect  of  substituting  one  queueing  type  for 
another.  Buzen  [BUZEN  74]  investigated  using  a  mathematically  less  complex  M/G/l  service 
center  to  approximate  an  M/G/l/K  service  center.  In  a  later  effort  Buzen  [BUZEN  77] 
approximated  an  M/G/l/K  service  center  by  using  an  M/M/l/K  service  center.  Buzen's  efforts 
were  directed  towards  a  single  service  center;  whereas,  Shum  [SHUM  76]  investigated  the 
substitution  of  M/G/l  product  terms  for  M/M/l  product  terms  in  an  effort  to  approximate 
general  service  time  distributions  in  a  multi-class  queueing  network. 

Avi-Itzhak  [AVI-IT  73]  used  a  conservation  of  flow  argument  to  establish  an  expression  for 
the  mean  burst  cycle  time  in  a  central  server  model.  This  expression  requires  the  mean  number  of 
busy  servers  (busy  probability)  at  a  central  server,  which  must  be  obtained  by  solving  the 
queueing  network  equations  and  summing  over  the  entire  state  space.  He  then  used  this 
parameter  along  with  an  assumed  geometric  cycle  distribution  as  an  approximation  to  the  queue 
dependent  mean  service  rate  in  a  single  server  queue.  Solving  the  basic  state  balance  equations, 
assuming  the  arrival  process  is  Poisson,  results  in  expressions  for  waiting  and  delay  times  for  the 
system. 

A  major  obstacle  in  using  the  basic  queueing  equations  as  approximations  to  queueing 
networks  is  the  difficulty  of  relating  the  corresponding  input  parameters  of  the  basic  queueing 
equations  to  those  of  queueing  networks.  Queueing  networks  require  the  mean  service  rate  of 
each  device,  Uj ,  transition  probabilities  between  devices,  p4  ■ ,  and  the  number  of  jobs  constantly 
circulating  in  the  network,  K.  The  basic  single  server  queueing  equations  require  the  same  first 
parameter,  but  utilize  arrival  rate  as  the  other. 

We  shall  utilize  a  similar  conservation  of  flow  argument  as  Avi-Itzhak  to  establish  a 
relationship  between  arrival  rate  and  the  number  of  jobs  in  the  network.  From  this  we  shall  utilize 
independent  single  server  queues  to  approximate  the  behavior  of  the  SCS  queueing  network 
model. 

Assuming  that  each  device  of  the  SCS  is  an  M/M/l  single  server  queue,  it  can  be  shown 
[BURKE  56,  FINCH  59,  BURKE  72,  MUNTZ  73,  KLIENR  76]  that  the  arrival  process  is 
equivalent  to  the  departure  process.  The  arrival  process  in  an  M/M/l  queue  is  Poisson  with 
parameter  a,  therefore,  the  mean  flow  rate  in  is  equal  to  the  mean  flow  rate  out: 

ratein  =  rateout  =  a 
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For  the  CPU  in  each  ICS  the  flow  out  is  decomposed  into  separate  Poisson  flows,  which  proceed 
to  the  various  PPUs  and  the  SPR.  The  decomposition  of  a  Poisson  flow  in  this  manner  is  linear 
[COFFMA  73,  pg  149-150].  For  the  SCS  this  results  in 


(1)  acpUi    -2   a^  +  agp^ 

j=2 


and 


aij    ~  Pij  aCPU4 
(2)  _R 

^PR-    ~  PSPR;  aCPU;    ~  pi,0  aCPU-  '       and  ^PR   ~  *    ^PR; 


1  11  ■  1 


i  =  l 


where  the  following  subscript  notation  is  adopted  for  clarity 

SPR;  =  i,0 
CPU;  =  i,l 
PPUjj  =  i,j      j>l 

Having  established  a  flow  relationship  between  devices,  a  relationship  between  the  queueing 
network  parameter  K  and  the  independent  single  sever  queueing  parameter  a  ■  is  necessary.  For 
an  M/M/l  queue  the  mean  queue  length  (including  a  job  in  service),  given  its  mean  arrival  (a) 
and  service  (u)  rates,  is  [KLIENR  75,  KLIENR  76,  COFFMA  73] 


(3)  Q  =        1 

1/p-l 

where  p  =  a/u.    By  assuming  each  device  is  an  independent  M/M/l  single  server  queue  we 
may  use  (2)  to  establish  the  following  relation 

R  sj 

(4)  K     =    2     {       1  +         1  +  2  1  } 

i=l       1/PsPR"1  i/Pcpu"1        J  =  2      ^Pppu."1 

1  *J 
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(5)  J,     =     _ai +   _I +  2 


where 


1/Pspr"1  i/PcPU*1        J=2     ^Pppu-."1 

1  *J 


R 

2     PsPRj  aCPUi 
PSPR     =     -1=1 


USPR 


Pij  aC3>Uj 


Pppu.  ,    _ 


Uij 


^PUj 
PCPU.     = »  and 


i 

u 


«i      = 


CPUj 


PsPRj  aCPUj 


R 

2    pSpR^  acpu^ 

r=l 


Assuming  father  that  each  ICS  is  identical  (i.  e.  a  balanced  system),  this  then  yields 


(6)  J.     =         1/R  +         1  +   2      L 


l/pSPR-l  l/pcpu-l         j=2      1/pppu  ..-1 

i  1J 
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B.   Computational  Algorithm  and  Performance  Measures 


Given  the  queueing  network  parameters  (J-,  p-  = ,  and  u- . )  one  may  approximate  the  flow 
rate,  acpu  ,  by  solving  (6).  Although  (6)  is  an  equation  with  a  single  unknown  and  not 
computationally  complex,  it  does  not  lend  itself  to  a  closed  form  analytic  solution.  We  shall 
present  an  algorithm,  utilizing  a  bounded  binary  search  technique,  to  efficiently  solve  (6)  for 
aCPU  •  Rewi"iting  (6)  results  in 


si 
(7)  Jj    =  1/R  +    __1 +   Z       __! 

l/(acpu  xSPR)  - 1         l/(acpu  xcpu  )- 1       j=2      l/Ca^y  xppu  )  - 1 


where 


R  PSPR; 


XSPR 

— 

USPR 

XPPUj  = 

= 

PiJ 

Uij 

XCPUj 

= 

1 

UCPU; 

,    and 


Without  loss  of  generality  assign    ucpu  =1.    This  produces  a  normalizing  effect,  allowing 

all  other  service  rates   to  be  stated  relative  to  this  standard  unit  of  service.   A  lower  bound  for 

flow  rate  is   zero,  and  from  inspection  of  (7)  an  upper  bound  is  MAX[xSpR,  x^^.  ,    xpp,  -     , 

...  ,  xppl,        ].     Using  these  flow  rate  bounds,  a  binary  search  technique  may  be  used  to 
ui,Si 

approach  the  flow  rate  that  will  satisfy  (7)  to  within  some  arbitrary  error  8  .   This  algorithm. 
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the  BIN  algorithm,  is  stated  more  formally  below.    Because  systems  are  balanced  for  i=l,  ...  , 
R  (i.e.  identical)  we  perform  the  following  on  the  arbitrarily  chosen  system  i=l. 

BEGIN  BIN 

STEP  1:  compute  initial  parameters 

xspr    =(rPspr)/uspr 

CPU     = 

STEP  BY  1     j  =  2     TO     s= 
j  i 

XPPU1J    =Plj/ulj 

END 

STEP  2:  set  initial  search  bounds 
low  =  0 
high  =  MAX[xspR,  xCPU,XppUi  , ...  ,xppUi    ] 

STEP  3:  evaluate  at  midpoint  of  bounds 

mid  =  (low+high)/2 

val  =  (l/R)/(l/(midxSPR)-l)  +  l/(l/(mid  x^-l) 

STEP  BY  1      j=2    TO     Sj 

val=vaH-l/(l/(midxpplJ    )-l) 

lj 
END 

STEP  4:  convergence  test  and  adjust  bounds 

IFdval-Jj  <  5)  GOTO   STEP 5 
IF(val<J,)  low = mid 
If(val>J1  )  high = mid 
GOTO   STEP 3 

STEP  5:  terminate  with  flow  rate = mid 
STOP 
END  BIN 

The  BIN  algorithm  requires  storage  for  the  vector  X,  containing  ss+ 1  elements,  the 
convergence  error,  and  the  instantaneous  solution  along  with  its  corresponding  search  region 
description  (bounds  and  midpoint).  Therefore,  the  total  storage  required  for  this  algorithm  is 

(Sj+1  )  +  1  +  (1  +  3)  =  Sj+6 

The  number  of  operations  necessary  to  evaluate  (7)  using  the  BIN  algorithm  depends  on  the 
number  of  iterations  required,  which  is  a  function  of  the  convergence  error.  For  2"^n+1K  6  <  2"n, 
the  maximum  number  of  iterations  is  n.  Each  iteration  requires  4  +  5  s,  operations,  thus  requiring 
an  operation  count  0[  n(4  +  5  s^)  ].  Comparing  these  complexities  with  those  in  figure  III-4  of 
chapter  III,  one  can  see  the  significant  advantage  of  this  approximation  over  the  exact  model. 
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The  BIN  algorithm  applied  to  (7)  allows  one  to  determine  the  job  flow  rate  for  a  given  set  of 
queueing  network  parameters.  Once  this  is  done  then  the  performance  measures  for  each  device 
may  be  easily  computed  using  the  following  [COFFMA  73,  KLIENR  75] : 


Device  busy  probability 

(8) 

PrhVO]  =PiJ 

Pr[n0>0]  =pspR 

Device  mean  queue  length 

°o=       l 

(9) 

l/pSPR  - 1 
°ii  =     l 

l/P?PU. .' l 
y 

Device  throughput 

(10) 

T0  =  USPR  Pspr 

T- .  =  u-     o 

Aij       uij   ^ij 

ij>0 


U>0 


Note,  throughput  may  more  properly  be  referred  to  as  the  effective  service  rate  or  departure  rate 
of  the  device,  which  for  an  M/M/l  queue  equals  a,  the  arrival  rate. 

The  mean  cycle  time  of  a  job  is  the  mean  time  (wait  or  delay)  between  successive  requests  to 
the  CPU  by  the  same  job.  This  is  the  weighted  sum  of  the  mean  time  it  takes  a  job  to  be  serviced 
at  each  device.  Using  Little's  formula  ( W = Q/a)  this  may  be  computed  as 


si 
Wi   -  eSPRj  WSPR   +   WCPUi    +    2      ePPUyWPPUi: 


(ID 


j=2 


eSPRjQsPR                QcPL'j               si       ePPUHQpPU-: 
+   +     I      


TSPR  aCPU^  j-2  TPPUi( 
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eSPRi  QsPR 

QcpiL          si     eppiL  .Qppu-  j 

R  PsPRj  aCP\J{ 

aCPUj              j  =  2     ^UjPpPU:: 

si 

1       {  QSPR/R    +  QCPU.    +    I  Qppu..    } 

arpi  j. 

j=2 

-    Ji/aCPU4 


where  the  relative  visit  frequency  is  (note:  Uq,^  =  1) 


*-{ 


PsPR^CPU^PsPRj 


Pppu-Ucpu-Pppu- 

ij  i  ij 


An  approximate  analysis  technique  has  been  presented  for  a  balanced  SCS  model  which  is 
much  less  complex  to  evaluate  compared  to  existing  efficient  queueing  network  technique.  The 
question  remains  as  to  the  error  this  approximation  introduces,  and  a  justification  for  the  choice  of 
an  M/M/l  queue. 

An  M/M/l/K  queue  is  an  M/M/l  with  a  finite  queue  length,  and  intuitively  would  seem  to 
better  approximate  the  operations  of  the  individual  devices  of  a  closed  network.  We  have 
investigated  the  use  of  this  well  known  queue,  and  typical  results  for  device  throughput  and  mean 
queue  length  are  presented  in  figures  IV-1  and  IV-2.  As  can  be  seen  from  these  figures  the 
M/M/l/K  queue  did  not  produce  significantly  better  results  than  the  M/M/l  queue  for  the 
examples  considered.  Generally,  the  most  important  aspect  of  using  these  models  concerns  when 
and  how  these  curves  react  to  variations  in  parameters.  Little ,  if  any,  significance  is  associated 
with  the  absolute  values  of  these  curves,  except  on  a  relative  basis.  This  implies  that  the  primary 
importance  of  any  approximation  is  in  "tracking"  the  actual  curve  rather  than  replicating  it.  As 
can  be  seen  from  the  figures  both  the  M/M/l  and  the  M/M/l/K  approximations  track  the  exact 
(SCS)  queueing  network  results. 
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The  M/M/l/K  approximation  does  not  yield  a  significantly  better  fit  to  the  exact  curve  than 
does  the  M/M/l  approximation;  therefore,  the  M/M/l  approximation  was  selected  for  the 
following  reasons.  First,  as  mentioned  earlier,  the  major  concern  is  tracking  the  exact  curve  and 
not  in  duplicating  it  Since  both  curves  track  well,  choosing  the  best  fit  was  not  necessary. 
Second,  the  computational  complexity  of  the  M/M/l/K  approximation  is  greater  than  that  of  the 
M/M/l.  The  M/M/l/K  expression  for  the  mean  queue  length  [ALLEN  78]  corresponding  to  (3) 
is 


p[l-(K  +  l)pK  +  Kp(K  +  1>] 
Q  =  - 


(1-P)(l-P<K+1>) 
K  +  l 


1/p-l  (l/p)(K  +  1)-l 

In  a  practical  situation  these  expressions  are  evaluated  by  a  computational  device  (computer 
or  calculator),  which  introduces  errors  due  to  the  use  of  approximation  algorithms  for 
exponentiation  and  the  lack  of  precision  (bits)  when  the  queue  approaches  saturation.  Also  the 
lower  computational  complexity  of  evaluating  (3)  allows  one  to  gain  insight  into  the  systems' 
operation  directly  from  the  form  of  the  equations. 

Buzen  [BUZEN  74]  has  compared  the  M/G/l  queue  as  an  approximation  to  the  M/G/l/K 
queue.  He  has  determined  that  except  for  heavy  traffic  (p  ~  1)  and  small  queue  capacity  (K  cr  1) 
that  the  relative  error  is  small.  Using  arguments  similar  to  the  ones  presented  here,  Buzen 
recommends  the  use  of  the  M/G/l  as  a  reasonable  approximation  to  the  M/G/l/K  queue. 


C.    Error  Analysis  of  Approximation 


The  sensitivity  and  magnitude  of  the  error  introduced  by  our  approximate  SCS  model 
compared  to  the  exact  SCS  model  is  investigated.  Since  the  form  of  the  exact  model  is  so 
mathematically  complex,  a  direct  analytical  comparison  is  not  feasible.  The  alternative  is  to 
numerically  evaluate  the  two  models  for  corresponding  parametric  values  and  compare  the  results. 
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The  problem  in  attempting  this  is  that  the  combination  of  all  possible  parametric  values  is 
infinitely  large.  Therefore,  a  reasonable  and  representative  subset  of  values  will  be  selected. 
Using  Fortran  programs  developed  to  implement  our  algorithms,  and  this  set  of  chosen 
parametric  values  we  will  compare  the  performance  measures  presented  in  the  previous  section, 
specfically,  the  throughput  and  mean  queue  length  of  the  CPU  and  SPR.  Note,  that  the 
throughput  measure  may  more  correctly  be  referred  to  as  effective  departure  rate.  Since  the 
device  busy  probability  performance  measure  is  directly  related  to  the  throughput  by  a  constant, 
comparing  either  one  to  the  corresponding  exact  value  would  yield  identical  results.  A  balanced 
system(identical  ICSs)  is  assumed  for  simplicity. 

Due  to  the  assumption  of  a  balanced  system  we  may  drop  the  added  burden  of  carrying  extra 
subscripts  to  distinguish  between  individual  ICSs,  as  the  notation  below  indicates.  This  notation 
simplification  results  in  previously  defined  vector  elements  (i.e.  Jj  and  Sj)  now  being  denoted  by 
their  vector  notation  (i.e.  J  and  s).  For  both  the  exact  and  approximate  models  the  following  are 
the  pertinent  parameters  and  their  complete  allowable  ranges  (see  Appendix  C  for  the  simplified 
notation): 

0<R<oo 

0<K  =  RJ  =  RJi<oo  ,    i  =  l,...,R 

0<s=Sj<oo  ,    i  =  l, ...  ,R 

0<uj  =  uiJ<oo  ,    i  =  i,...,R   andj=0,  ...  .«, 

0<p=p    <1  ,    i  =  l,...,R   and  j=0,  ...  ,&   . 

All  of  the  parameters,  with  the  exception  of  the  transition  probabilities,  each  have  an 
infinitely  large  range,  as  can  be  seen  above.  The  selection  of  a  small,  finite  subset  of  each  of  them 
to  form  a  manageable  sample  space  shall  now  be  discussed. 

Both  the  number  of  ICSs,  R,  and  the  number  of  jobs  per  ICS,  J,  have  a  significant  impact  on 
the  computational  complexity  of  the  exact  SCS  model.  From  chapter  III,  the  computation  tine 
complexity  is  0[  (J  +  1)R  ].  Because  of  our  interest  in  modular  expansion,  R  is  felt  to  be  slightly 
more  important.  Therefore,  our  selection  put  more  emphasis  on  R  than  J.  From  initial  testing 
and  experimentation,  we  found  that  the  processing  time  for  the  configuration  of  J  =  2  and  R  =  8  on 
a  DEC  10  computer  was  approximately  1.4  minutes.    Based  on  this  we  selected  8  as  the  maximum 
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value  of  R  and  subsequently  6  as  the  maximum  value  of  J.  Further  selection  of  additional 
elements  to  construct  representative  sets  large  enough  to  provide  insight  into  developing  trends 
resulted  in  J  =  {2, 4, 6}  and  R  =  {1,  2,  5, 8}. 

The  number  of  devices  within  an  ICS,  s,  does  not  present  a  significant  computational 
problem.  In  the  exact  model  all  of  the  devices  within  each  ICS  are  "collapsed"  into  a  single 
equivalent  device.  A  reasonable  upper  limit  might  be  11  devices  per  ICS,  comprising  a  CPU  and 
10  PPUs.  For  a  large  number  of  configurations  this  would  provide  for  a  sufficient  number  of 
PPUs.  For  these  configurations  and,  also  for  larger  ones,  a  representative  set  large  enough  to 
provide  insight  into  any  developing  trends  is  {2,  6, 11}. 

The  two  remaining  parameters,  processing  rates  and  transition  probabilities,  differ  from  the 
others  in  that  a  single  value  is  not  a  sufficient  specification.  A  group  of  values  for  each  parameter 
is  required,  one  for  each  device  within  an  ICS  as  well  as  the  SPR.  The  value  of  either  of  these 
parameters  does  not  itself  impact  the  computational  effort  required,  although  each  group  of  values 
requires  a  separate  computation,  as  does  a  change  of  any  parameter.  A  finite  subset  of  values  for 
each  of  these  two  parameters  will  be  first  selected,  and  a  procedure  to  be  used  to  assign  these 
values  to  the  devices  will  be  discussed. 

Since  the  processing  rate  of  the  CPU  has  been  fixed  at  unity,  all  other  processing  rates  are 
relative  to  the  CPU.  A  relative  range  spanning  3  orders  of  magnitude  from  .01  to  10.0  provides  a 
representative  range.  The  processing  rates  are  important  parameters  of  the  model.  Contrasting 
their  importance  is  the  need  to  minimize  the  sample  space.  As  a  compromise,  we  selected  a 
relatively  large  number  of  values,  10.  We  have  selected  the  set  {.01,  .02,  .05,  .1,  .2,  .5, 1.0,  2.0,  5.0, 
10.0}. 

Each  device  transition  probability  by  definition  is  bounded  between  0  and  1,  and  the  sum  of 
all  transition  probabilities  from  each  CPU  is  constrained  to  equal  unity.  A  representative  selection 
must  span  the  bounded  range,  therefore,  {.1,  .25,  .5,  .75,  .9}  has  been  selected. 

The  assignment  procedure  we  will  follow  is  to  select  a  transition  probability  value  from  the 
subset  and  assign  it  to  the  SPR,  PSpr-  The  remaining  probability,  1  -  pSPR,  will  be  randomly 
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distributed  among  each  of  the  remaining  s  devices.  Also  at  the  same  time  a  device  transition 
probability  is  assigned,  the  relative  processing  rate  will  also  be  assigned  by  random  selection  from 
the  subset  of  relative  processing  rates. 

The  details  of  this  procedure  are  discussed  here  and  the  algorithm  is  presented  below.  For 
each  device  divide  the  remaining  probability  into  two  groups.  The  first  group  is  a  reserve,  of  25%, 
to  assure  that  any  remaining  devices  are  allocated  some  probability.  The  other  group, 
representing  the  bulk  of  the  probability,  is  the  selection  range  for  the  current  device.  Generate  a 
random  number  in  the  continuous,  open  interval  (0,1),  from  a  uniform  probability  distribution. 
Multiply  this  fraction  by  the  upper  value  of  the  probability  selection  range.  The  resulting  value 
represents  the  transition  probability  to  be  assigned  to  the  current  device.  To  select  a  processing 
rate  for  the  device  generate  a  random  number  in  the  discrete  closed  integer  range  of  [1,10]  from  a 
uniform  distribution.  This  number  represents  the  corresponding  ordinal  element  in  the  relative 
processing  rate  subset  that  is  to  be  assigned  to  the  device. 

The  assignment  algorithm  is  : 


BEGIN  ASSIGN(pSpR) 

prob  =  1.0-pSPR 

STEP  BY  1    j  =  l     TO     s 

DO 

n  =  .75*(prob)*Ranc[0,l] 

IF     j  =  1 

THEN    U:  =  1.0 
ELSE     Uj  =  speed(Rand[l,10]) 
prob  =  prob  -  p: 
END 

Ps=  Ps+prob 
END  ASSIGN 


where 

Ranc[a,b]   is  a  function  which  generates  a  uniformly  distributed  random  number 

in  the  continuous,  open  interval  from  a  to  b,  and 
Rand[l,n]  is  a  function  which  genarates  a  uniformly  distributed  random  integer 

in  the  discrete,  closed  interval  from  1  to  n. 
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We  have  now  converted  from  an  infinitely  large  population  space  to  a  reasonably  sized 
sample  space  of  1800  combinations.  The  resulting  sample  space  parameter  values  are: 


R  =  {1,  2,  5,  8} 
J    =  {2,  4,  6} 
s    =  {2,  6,  11} 


U<,PR  =  {.01,  .02,  .05,  .1,  .2,  .5, 1.,  2.,  5.,  10.} 


4SPR 


UCPU  "  1- 


Uj  =  a  random  selection  from  the  same  set  as  uspR ,  j  =  2,  ...  ,  s 
pspR  =  {.1,  .25,  .5,  .75,  .9} 

P:  =  a  random  selection  from  the  same  set  as  pSpR ,  j  =  1,  ...  ,  s 

Using  our  Fortran  implementation  of  both  models,  values  for  the  throughput  and  mean 
queue  length  of  both  the  CPU  and  SPR  have  been  generated  in  the  following  manner.  For  each 
of  the  15  combinations  of  s  x  pSPR,  15  corresponding  groups  of  values  for  p.  and  u-  were 
generated.  A  program,  based  on  the  ASSIGN  algorithm  above,  was  constructed  in  Fortran  to  do 
this,  and  its  results  are  listed  in  appendix  D.  The  entire  120  combinations  of  uSpR  x  R  x  J  were 
used  15  times,  once  for  each  of  the  15  groups  of  transition  probabilities  and  relative  processing 
rates. 

Table  IV- 1  contains  the  accounting  statistics  on  the  actual  CPU  processing  times  for  both  the 
exact  and  approximate  SCS  models  executed  on  a  DEC  10  computer.  Both  models  were  executed 
in  a  batch  environment  with  all  input  data  completely  specified  in  advance  in  a  file.  The  times  are 
based  on  an  execution  unit  which  computes  a  set  of  40  data  points.  This  represents  one  value 
from  the  J  set,  one  group  of  transition  probability  and  relative  processing  rate  values,  and  the 
entire  40  combinations  of  the  uSPRxR  set. 

The  processing  time  for  an  execution  unit  of  the  exact  SCS  model  should  be  10  At  0[  (J  + 1)1 
+  (J  4- 1)2  +  (J  +  l)5  4-  (J  +  l)8  ],  where  At  is  the  average  time  per  operation.  The  high  order  term 
dominates  the  expression,  which  may  be  approximated  by  10  At  0[  (J  + 1)8  ].  From  this  the 
expected  relative  processing  time  of  an  execution  unit  is  Tb  (J  + 1)8  /( Jb  + 1)8 ,  where 
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40  Data 
Point 
Execution  Unit 

Exact 

SCS 

Model 

Approximate 
SCS             | 
Model 

J  =  2 

CPU  time 
Elapsed  time 

1.4  Min. 
2.2  Min. 

2.9  Sec. 
3.5  Sec. 

J  =  4 

CPU  time 
Elapsed  time 

13.0  Min. 
14.7  Min. 

2.9  Sec. 
3.5  Sec. 

J  =  6 

CPU  time 
Elapsed  time 

148  Min. 
178  Min. 

2.9  Sec. 
3.5  Sec. 

TOTAL  for  1800  points 
CPU  time 
Elapsed  time 

2436  Min.  -  40.6  Hrs. 
2924  Min.  =  48.8  Hrs. 

131  Sec. 
158  Sec. 

Tabic  IV-1.  Execution  unit  processing  times. 
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Jb  is  a  base  reference  value  of  jobs  per  ICS  and  Tb  is  the  corresponding  average  measured 
processing  time.  For  the  sample,  Jb  =  2  and  Tb  is  approximately  1.4  minutes.  Therefore,  for  an 
execution  unit  of  J  =  6  the  increase  in  processing  time  is  (6  + 1)8/(2  + 1)8  =  (7/3)8  ~  878   times 
longer  than  the  Jb  =  2  execution  unit,  or  1229  minutes.  Since  the  system  is  a  balanced  one  (all 
ICSs  are  identical),  the  actual  computations  need  only  be  carried  out  for  one  ICS,  rather  than  for 
all  R.  This  reduces  the  number  of  computations  by  approximately  1/R,  resulting  in  a  revised 
increased  processing  time  of  109  (vs.  878)  times  the  Jb  =  2  excution  unit,  or  152  minutes.  This 
agrees  reasonably  well  with  the  average  measured  value  of  148  minutes.  Similarly  for  the  J  =  4 
execution  unit  an  increase  of  about  7.5  times  is  predicted,  or  10.5  minutes  compared  with  the 
average  measured  value  of  13.0  minutes.  These  measurements  verify  the  relations  developed  in 
chapter  III  for  the  number  of  operations  required  to  compute  the  performance  measures  for  any 
given  SCS  system  configuration.  The  majority  of  error  is  attributed  to  approximating  this 
relationship  by  only  its  dominant  term. 

In  contrast  to  these  exponentially  increasing  processing  times  on  the  order  of  minutes  and 
hours,  the  processing  times  of  our  approximate  SCS  model  are  on  the  order  of  seconds,  and  for 
balanced  systems  are  independent  of  R  and  J.  This  is  verified  by  the  average  measured  processing 
times  in  table  IV-1,  and  by  examination  of  (7)  and  the  BIN  algorithm  used  for  its  solution. 

Tables  IV-2  through  IV-5  contain  relative  error  ( =  {exact  value  -  approximate  valuejVexact 
value )  statistics  produced  from  the  results  of  computing  the  performance  measures  from  all  the 
sample  space  parameters  for  both  models.  These  error  statistics  consist  of  mean,  variance, 
minimum,  and  maximum  values  for  each  of  the  individual  parameters  and  for  all  the  parameters 
together.  Each  of  the  relative  error  values  are  further  organized  as  a  function  of  pSPR,  the  traffic 
intensity  of  the  SPR.  The  relative  error  statistics  are  listed  in  pairs,  first  all  values  of  pSpR  and 
second  P$pr<-9.  Each  table  consists  of  6  subtables.  The  first  (top)  subtable  contains  the  overall 
statistics  for  the  indicated  performance  measure.  The  heading  of  the  first  column  of  each  of  the 
remaining  5  subtables  indicates  the  parameter  being  investigated  within  that  subtable.  Each  row 
of  a  subtable  represents  the  statistics  for  a  single  value  of  the  parameter  being  investigated,  with  all 
other  parameters  varied  through  their  complete  sample-space  ranges.  The  first  column  contains 
the  value  of  the  parameter,  the  second  column  contains  the  number  of  data  points  used  to 
compute  the  statistics,  and  the  remaining  columns  contain  the  statistics  as  indicated. 
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p 

Points 

Mean 

Variance 

Minimum 

Maximum 

— 

1650/1158 

0.0904/  0.1171 

0.0189/  0.0221 

-0.4489/-0.3824 

0.3636/  0.3636 

PSPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.10 

330/  266 

0.1030/  0.1194 

0.0139/  0.0151 

-0.2649/-0.1949 

0.3396/  0.3396 

0.25 

330/  260 

0.1237/  0.1506 

0.0199/  0.0205 

-0.3698/-0.3062 

0.3529/  0.3529 

0.50 

330/  208 

0.0139/  0.0049 

0.0211/  0.0272 

-0.3648/-0.3017 

0.3500/  0.3500 

0.75 

330/  227 

0.1590/  0.2124 

0.0103/  0.0040 

0.0000/  0.0330 

0.3382/0.3382 

0.90 

330/  197 

0.0525/  0.0781 

0.0162/  0.0224 

-0.4489/-0.3824 

0.3636/  0.3636 

J 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

600/  458 

0.1119/  0.1308 

0.0243/  0.0286 

•0.3824/-0.3824 

0.3636/  0.3636 

4.00 

600/  411 

0.0801/  0.1132 

0.0173/  0.0193 

-0.4489/-0.3062 

0.3333/  0.2619 

6.00 

450/  289 

0.0754/  0.1009 

0.0128/  0.0152 

-0.3698/-0.1949 

0.2330/  0.2330 

s 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

550/  387 

0.0808/  0.1048 

0.0209/  0.0250 

-0.4489/-0.3824 

0.3636/  0.3636 

6.00 

550/  389 

0.1180/  0.1526 

0.0140/  0.0140 

-0.3028/-0.2454 

0.3636/  0.3636 

11.00 

550/  382 

0.0724/  0.0933 

0.0205/  0.0255 

-0.3698/-0.3062 

0.3636/  0.3636 

R 

Points 

Mean 

Variance 

Minimum 

Maximum 

1.00 

450/  450 

0.1446/  0.1446 

0.0206/  0.0206 

-0.1744/-0.1744 

0.3636/  0.3636 

2.00 

450/  383 

0.1059/  0.1100 

0.0155/  0.0180 

-0.1765/-0.1765 

0.3429/  0.3429 

5.00 

450/  213 

0.0512/  0.0846 

0.0152/  0.0247 

-0.3698/-0.3062 

0.3333/0.3333 

8.00 

300/  112 

0.0448/  0.0923 

0.0178/  0.0318 

-0.4489/-0.3824 

0.3429/  0.3429 

USPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.01 

165/  76 

0.1312/  0.2082 

0.0138/  0.0065 

0.0000/  0.0909 

0.3636/  0.3636 

0.02 

165/  79 

0.1072/  0.1925 

0.0104/  0.0059 

0.0000/  0.0769 

0.3529/  0.3529 

0.05 

165/  85 

0.1178/  0.1916 

0.0090/0.0051 

0.0000/  0.0857 

0.3458/  0.3458 

0.10 

165/  92 

0.1207/  0.1820 

0.0086/  0.0065 

-0.0295/-0.0295 

0.3471/  0.3471 

0.20 

165/  100 

0.1158/  0.1662 

0.0099/  0.0091 

-0.1424/-0.1424 

0.3333/0.3333 

0.50 

165/  120 

0.0849/  0.1127 

0.0148/  0.0166 

-0.2649/-0.1949 

0.3333/0.3333 

1.00 

165/  131 

0.0543/  0.0760 

0.0235/  0.0259 

-0.3698/-0.3062 

0.3333/0.3333 

2.00 

165/  147 

0.0423/  0.0627 

0.0322/  0.0312 

-0.3648/-0.3017 

0.3333/0.3333 

5.00 

165/  163 

0.0579/  0.0633 

0.0317/  0.0296 

-0.4489/-0.3824 

0.3333/0.3333 

10.00 

165/  165 

0.0719/  0.0719 

0.0264/  0.0264 

-0.1689/-0.1689 

0.3333/  0.3333 

Table  IV-2.  Relative  Error  statistics  for  QCPU ,  for  both  all  pip  <  .90  . 
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p 

Points 

Mean 

Variance 

Minimum 

Maximum 

— 

1650/1158 

0.1551/0.1968 

0.0066/  0.0034 

0.0000/  0.0983 

0.3383/0.3383 

PSPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.10 

330/  266 

0.1627/0.1879 

0.0039/  0.0015 

0.0300/  0.1025 

0.3347/  0.3347 

0.25 

330/  260 

0.1782/0.2110 

•  0.0074/  0.0042 

0.0300/  0.1000 

0.3371/0.3371 

0.50 

330/  208 

0.1388/0.1868 

0.0060/  0.0031 

0.0250/  0.0990 

0.3367/  0.3367 

0.75 

330/  227 

0.1592/  0.2055 

0.0078/  0.0041 

0.0189/  0.0983 

0.3383/0.3383 

0.90 

330/  197 

0.1365/  0.1905 

0.0069/  0.0039 

0.0000/  0.1005 

0.3378/  0.3378 

J 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

600/  458 

0.2026/  0.2418 

0.0079/  0.0038 

0.0400/  0.0992 

0.3383/0.3383 

4.00 

600/411 

0.1378/  0.1804 

0.0048/  0.0011 

0.0241/  0.0983 

0.2156/  0.2156 

6.00 

450/  289 

0.1147/  0.1486 

0.0024/  0.0002 

0.0000/  0.0990 

0.1642/  0.1642 

s 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

550/  387 

0.1611/0.2051 

0.0080/  0.0046 

0.0000/  0.0988 

0.3383/  0.3383 

6.00 

550/  389 

0.1527/  0.1922 

0.0056/  0.0024 

0.0000/  0.0992 

0.3383/0.3383 

11.00 

550/  382 

0.1513/0.1930 

0.0063/  0.0031 

0.0000/  0.0983 

0.3378/  0.3378 

R 

Points 

Mean 

Variance 

Minimum 

Maximum 

1.00 

450/  450 

0.2068/  0.2068 

0.0037/  0.0037 

0.1169/0.1169 

0.3383/0.3383 

2.00 

450/  383 

0.1687/  0.1840 

0.0037/  0.0028 

0.0721/  0.0990 

0.3362/  0.3362 

5.00 

450/  213 

0.1180/  0.1880 

0.0063/  0.0033 

0.0000/  0.0983 

0.3371/0.3371 

8.00 

300/  112 

0.1125/0.2166 

0.0078/  0.0030 

0.0241/  0.0988 

0.3371/0.3371 

USPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.01 

165/  76 

0.1238/0.1977 

0.0074/  0.0052 

0.0000/  0.1000 

0.3383/0.3383 

0.02 

165/  79 

0.1216/  0.1934 

0.0075/  0.0051 

0.0189/  0.1025 

0.3378/0.3378 

0.05 

165/  85 

0.1259/  0.1920 

0.0073/  0.0046 

0.0241/  0.1025 

0.3376/  0.3376 

0.10 

165/  92 

0.1316/0.1908 

0.0069/  0.0041 

0.0288/0.1011 

0.3362/  0.3362 

0.20 

165/  100 

0.1377/0.1899 

0.0062/  0.0030 

0.0300/  0.0994 

0.3371/0.3371 

0.50 

165/  120 

0.1474/  0.1827 

0.0054/  0.0027 

0.0303/  0.0983 

0.3362/0.3362 

1.00 

165/  131 

0.1618/0.1896 

0.0053/  0.0028 

0.0317/  0.0992 

0.3352/0.3352 

2.00 

165/  147 

0.1835/0.1986 

0.0044/  0.0028 

0.0335/  0.0988 

0.3352/0.3352 

5.00 

165/  163 

0.2061/  0.2078 

0.0030/  0.0028 

0.0562/  0.1387 

0.3352/0.3352 

10.00 

165/  165 

0.2113/0.2113 

0.0029/  0.0029 

0.1387/0.1387 

0.3359/0.3359 

Table  IV-3.  Relative  Error  statistics  for  Tq,^  for  both  all  p/p  <  .90  . 
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p 

Points 

Mean 

Variance 

Minimum 

Maximum 

— 

1650/1158 

0.1042/0.1459 

0.0195/  0.0215 

-0.1595/-0.1595 

0.5792/  0.5792 

PSPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.10 

330/  266 

0.1323/  0.1623 

0.0160/  0.0148 

-0.1595/-0.1595 

0.5000/  0.5000 

0.25 

330/  260 

0.1532/  0.1920 

0.0232/  0.0220 

-0.1229/-0.1229 

0.5401/  0.5401 

0.50 

330/  208 

0.0659/  0.1013 

0.0138/  0.0178 

-0.1529/-0.1529 

0.4651/  0.4651 

0.75 

330/  227 

0.1095/  0.1566 

0.0219/  0.0244 

-0.1205/-0.1205 

0.5792/  0.5792 

0.90 

330/  197 

0.0603/  0.0980 

0.0161/  0.0230 

-0.1507/-0.1507 

0.5449/  0.5449 

J 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

600/  458 

0.1159/0.1519 

0.0253/  0.0277 

-0.1595/-0.1595 

0.5449/  0.5449 

4.00 

600/  411 

0.1049/  0.1483 

0.0187/  0.0206 

-0.1291/-0.1291 

0.5792/  0.5792 

6.00 

450/  289 

0.0879/  0.1332 

0.0124/  0.0127 

-0.1216/-0.1216 

0.5091/  0.5091 

s 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

550/  387 

0.1188/  0.1658 

0.0227/  0.0243 

-0.1338/-0.1338 

0.5449/  0.5449 

6.00 

550/  389 

0.0993/  0.1386 

0.0175/  0.0192 

-0.1595/-0.1595 

0.5000/  0.5000 

11.00 

550/  382 

0.0946/  0.1333 

0.0180/  0.0204 

-0.1594/-0.1594 

0.5792/  0.5792 

R 

Points 

Mean 

Variance 

Minimum 

Maximum 

1.00 

450/  450 

0.0895/  0.0895 

0.0180/  0.0180 

-0.1595/-0.1595 

0.5000/  0.5000 

2.00 

450/  383 

0.1037/  0.1227 

0.0168/  0.0173 

-0.1177/-0.1177 

0.5000/  0.5000 

5.00 

450/  213 

0.1097/  0.2240 

0.0196/  0.0154 

-0.0273/-0.0273 

0.5091/  0.5091 

8.00 

300/  112 

0.1190/0.3039 

0.0251/  0.0104 

-0.0056/  0.0645 

0.5792/  0.5792 

USPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.01 

165/  76 

-0.0054/-0.0109 

0.0005/  0.0011 

-0.1595/-0.1595 

0.0901/  0.0901 

0.02 

165/  79 

-0.0003/-0.0001 

0.0033/  0.0069 

-0.1594/-0.1594 

0.3029/  0.3029 

0.05 

165/  85 

0.0178/  0.0325 

0.0102/  0.0188 

-0.1338/-0.1338 

0.5026/  0.5026 

0.10 

165/  92 

0.0321/  0.0570 

0.0126/  0.0212 

-0.1229/-0.1229 

0.5379/  0.5379 

0.20 

165/  100 

0.0521/  0.0832 

0.0164/  0.0241 

-0.1529/-0.1529 

0.4294/  0.4294 

0.50 

165/  120 

0.0975/  0.1316 

0.0198/  0.0224 

-0.1507/-0.1507 

0.4998/  0.4998 

1.00 

165/  131 

0.1548/  0.1888 

0.0156/  0.0134 

-0.0870/-0.0870 

0.5792/  0.5792 

2.00 

165/  147 

0.2159/  0.2351 

0.0104/  0.0077 

-0.0004/-0.0004 

0.5401/  0.5401 

5.00 

165/  163 

0.2514/  0.2517 

0.0077/  0.0078 

0.0000/  0.0000 

0.5091/  0.5091 

10.00 

165/  165 

0.2267/  0.2267 

0.0088/  0.0088 

0.0000/  0.0000 

0.5449/  0.5449 

Table  IV-4.  Relative  Error  statistics  for  QSpR  ,  for  both  all  p/p  <  .90 
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p 

Points 

Mean 

Variance 

Minimum 

Maximum 

— 

1650/1158 

0.1549/  0.1967 

0.0067/  0.0034 

0.0300/  0.0988 

0.3409/  0.3409 

PSPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.10 

330/  266 

0.1627/  0.1879 

0.0040/  0.0015 

0.0300/  0.1000 

0.3367/  0.3367 

0.25 

330/  260 

0.1782/  0.2112 

0.0075/  0.0042 

0.0300/  0.1000 

0.3409/  0.3409 

0.50 

330/  208 

0.1387/0.1868 

0.0061/  0.0031 

0.0300/  0.0990 

0.3400/  0.3400 

0.75 

330/  227 

0.1586/  0.2052 

0.0078/  0.0042 

0.0300/  0.0988 

0.3400/  0.3400 

0.90 

330/  197 

0.1361/  0.1900 

0.0069/  0.0039 

0.0300/  0.1006 

0.3400/  0.3400 

J 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

600/  458 

0.2025/  0.2419 

0.0080/  0.0038 

0.0590/  0.0989 

0.3409/  0.3409 

4.00 

600/  411 

0.1371/  0.1802 

0.0049/  0.0012 

0.0300/  0.0988 

0.2145/  0.2145 

6.00 

450/  289 

0.1150/  0.1486 

0.0023/  0.0002 

0.0300/  0.0990 

0.1659/  0.1659 

s 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

550/  387 

0.1611/  0.2052 

0.0080/  0.0046 

0.0300/  0.0989 

0.3409/  0.3409 

6.00 

550/  389 

0.1525/  0.1920 

0.0057/  0.0025 

0.0300/  0.0989 

0.3400/  0.3400 

11.00 

550/  382 

0.1511/0.1928 

0.0063/  0.0031 

0.0300/  0.0988 

0.3400/  0.3400 

R 

Points 

Mean 

Variance 

Minimum 

Maximum 

1.00 

450/  450 

0.2070/  0.2070 

0.0037/  0.0037 

0.1175/0.1175 

0.3409/  0.3409 

2.00 

450/  383 

0.1684/  0.1836 

0.0037/  0.0028 

0.0750/  0.0990 

0.3371/0.3371 

5.00 

450/  213 

0.1182/  0.1880 

0.0062/  0.0033 

0.0300/  0.0988 

0.3364/  0.3364 

8.00 

300/  112 

0.1115/  0.2166 

0.0079/  0.0030 

0.0300/  0.0989 

0.3357/  0.3357 

USPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.01 

165/  76 

0.1208/  0.1959 

0.0077/  0.0056 

0.0300/  0.1000 

0.3400/  0.3400 

0.02 

165/  79 

0.1219/  0.1936 

0.0074/  0.0051 

0.0300/  0.1000 

0.3400/  0.3400 

0.05 

165/  85 

0.1264/  0.1919 

0.0072/  0.0045 

0.0300/  0.1040 

0.3367/  0.3367 

0.10 

165/  92 

0.1316/  0.1906 

0.0069/  0.0041 

0.0300/  0.1000 

0.3347/  0.3347 

0.20 

165/  100 

0.1378/  0.1900 

0.0062/  0.0030 

0.0305/  0.1010 

0.3409/  0.3409 

0.50 

165/  120 

0.1475/  0.1828 

0.0054/  0.0027 

0.0308/  0.0988 

0.3409/  0.3409 

1.00 

165/  131 

0.1618/0.1896 

0.0053/  0.0028 

0.0315/  0.0989 

0.3409/  0.3409 

2.00 

165/  147 

0.1835/0.1987 

0.0044/  0.0028 

0.0333/  0.0989 

0.3409/  0.3409 

5.00 

165/  163 

0.2061/  0.2078 

0.0030/  0.0028 

0.0563/0.1387 

0.3409/  0.3409 

10.00 

165/  165 

0.2114/  0.2114 

0.0029/  0.0029 

0.1387/0.1387 

0.3409/  0.3409 

Table  IV-5.  Relative  Error  statistics  for  TspR,  for  both  all  p/p  < 
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We  would  like  to  mention  an  additional  limitation  that  has  not,  to  our  knowledge,  been 
discussed  in  the  literature.  This  limitation  is  related  to  the  size  limitation  as  it  affects  the  precision 
of  the  specific  implementation.  We  noticed  some  erratic  values  from  the  exact  model  were 
occurring  for  points  within  our  sample  space  whose  computational  complexity  was  the  highest, 
namely  J  =  6  and  R  =  8.  We  conjecture  that  because  of  the  large  number  of  floating  point 
operations  required  to  evaluate  the  exact  model  at  these  points,  some  combination  of  accumulated 
round-off,  overflow,  or  underflow  errors  was  the  cause.  These  erratic  values  were  not  observed  for 
a  similar  implementation  on  a  CDC  6000  series  machine  with  a  60  bit  word  length.  The  current 
implementation  uses  a  DEC  10  machine  with  a  36  bit  word  length.  A  possible  solution  for  this 
case  may  be  to  use  double  precision  variables  vs.  the  single  precision  variables  used  in  the  current 
implementation.  The  author  no  longer  has  access  to  the  former  machine,  and  due  to  the  length  of 
the  computations  involved  was  not  able  to  pursue  this  any  further  at  this  time.  As  a  result  we  have 
eliminated  these  150  data  points,  thereby  reducing  our  sample-space  from  1800  to  1650  data 
points. 

Figures  IV-3  through  IV-6  are  scatter  plots  (left)  of  the  relative  error  for  each  performance 
measure  along  with  its  corresponding  mean  value  plot  (right)  for  all  1650  data  points  in  the  sample 
space  as  a  function  of  pSPR.  A  scatter  plot  consists  of  the  true  plotting  of  all  the  points,  wherever 
they  fall  —  generally  scattered.  Each  discrete  plotted  point  consists  of  a  digit  representing  the 
number  of  actual  points  encompassed  by  it.  An  asterik  (*)  represents  10  or  more  points.  Figures 
IV-7  through  IV- 16  are  representative  scatter  and  mean  value  plots  of  the  relative  error  for  the 
SPR  throughput  and  mean  queue  length  as  a  function  of  pSPR  for  one  value  of  each  of  the  five 
parameters.   Figures  IV- 17  through  IV-20  are  representative  plots  of  the  throughput  and  mean 
queue  length  of  the  CPU  and  SPR  vs.  the  processing  rate  of  the  SPR,  uSpR,  as  computed  by  both 
the  exact  and  approximate  SCS  models. 

As  can  be  seen  from  inspection  of  the  tables  and  plots,  the  throughput  relative  error  for  both 
the  CPU  and  SPR  follow  a  fairly  narrow  channel  centered  approximately  at  .20  (20%)  for  low  to 
moderate  traffic.  As  the  traffic  intensity  becomes  heavy  (pSPR  >  .80)  and  approaches  saturation 
(pSPR  ~  1),  the  relative  error  tends  to  become  small.  This  is  consistent  with  results  obtained  by 
Buzen  [BUZEN  74]  in  his  use  of  single  server  approximations.  It  should  be  mentioned  that  in 
applying  this  approximate  model,  if  this  saturation  condition  occurs,  one  immediately  knows  that 
this  device  is  a  bottleneck  and  is  causing  serious  problems. 
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From  these  relative  error  plots  and  statistics  tables  of  the  individual  parameters,  one  can  see 
that  this  approximation  does  possess  some  parameter  sensitivity.   The  throughput  performance 
measure  does  not  indicate  any  sensitivity  trends  for  the  pSpR  and  s  parameters.   For  the  J 
parameter  a  significant  decreasing  trend  in  mean  and  variance  of  the  relative  error  is  observed  as  J 
increases.   No  definite  trend  for  the  R  parameter  can  readily  be  detected.   Although  the  overall 
mean  does  decrease  as  R  increases,  the  variance  and  the  mean  for  those  points  not  near  device 
saturation  do  not.   The  pSpR  parameters  are  observed  to  behave  similarly  .   As  a  result,  the 
throughput  performance  measure  exhibits  some  sensitivity  to  the  J  and  R  parameters. 

In  contrast  to  the  throughput  performance  measure,  the  mean  queue  length  has  lower  relative 
error  values,  approximately  12%  for  the  CPU  and  15%  for  the  SPR.  The  mean  queue  lengths 
predicted  by  the  approximate  model  underestimates  those  of  the  exact  model  for  low  values  of 
pSPR  and  overestimates  them  for  high  values  of  pSPR-  This  accounts  for  the  lower  mean  relative 
error  values.   In  a  manner  similar  to  throughput,  as  saturation  is  approached  the  relative  error 
becomes  small.  The  variance  of  this  relative  error  is  higher  than  that  for  throughput, 
approximately  ±14%  vs.  ±6%.  This  can  be  seen  from  the  scatter  plots,  figures  IV-3  through  IV- 
16,  especially  for  mid-range  values  of  pSPR.  This  implies  that  the  mean  queue  length  performance 
measure  is  more  sensitive  to  our  approximation  than  is  the  throughput  performance  measure. 
This  is  consistent  with  results  obtained  by  Buzen  [BUZEN  77]  in  his  use  of  single  server 
approximations. 

Observations  of  the  limited  relative  error  statistics  from  the  tables  and  plots  indivate  some 
trends  and  sensitivities,  although  they  are  inconclusive.  The  mean  queue  length  performance 
measure  does  indicate  a  decreasing  sensitivity  to  the  s  parameter  as  this  parameter  increases.  This 
may  be  credited  to  these  PPUs  handling  a  larger  portion  of  the  workload  and,  therefore,  the  CPU 
and  SPR  are  less  heavily  loaded.    Both  devices  do  not  indicate  any  sensitivity  to  the  pSPR 
parameter.   The  J  parameter  indicates  a  definite  sensitivity  trend  in  both  mean  and  variance.   As 
J  increases  the  relative  error  decreases.   Both  devices  indicate  a  trend  for  the  uSPR  parameter,  but 
in  opposite  directions.   The  CPU  demonstrates  a  lower  mean  and  variance  as  uSpR  increases,  but 
then  as  uspR  becomes  relatively  fast  the  mean  and  variance  tend  to  reverse  and  increase.  The 
SPR  demonstrates  a  totally  opposite  response.   The  mean  and  variance  increase  with  increasing 
USPR  an(*  t*ien  decrease.   F°r  the  R  parameter  both  devices  demonstrate  a  trend,  but  again  exhibit 
opposite  reactions. 


95 


The  CPU  exhibits  a  decreasing  mean  as  R  increases,  while  the  SPR  trend  is  an  increasing  one. 
The  variance  of  both  devices  do  not  indicate  a  clear  trend  for  this  parameter.  Therefore,  the 
mean  queue  length  performance  measure  demonstrates  a  higher  sensitivity  to  the  J  and  R 
parameters  than  the  throughput  performance  measure,  and  is  also  sensitive  to  the  uSpR  parameter. 
In  addition,  the  CPU  and  SPR  demonstrate  opposite  sensitivity  for  the  uspR  and  R  parameters. 

An  error  analysis  of  the  approximation  has  been  presented  to  aid  designers  and  analysts  when 
they  apply  this  approximation.    Although  the  error  analysis  is  by  no  means  elaborate  or 
conclusive,  some  preliminary  trends  and  sensitivities  have  been  identified.  This  by  far  exceeds 
the  error  analysis  presented  to  support  other  approximations  in  the  literature.   Further  work  to 
establish  an  accurate  error  function  which  incorporates  all  these  parameters  is  still  needed. 

In  conclusion,  we  feel  that  the  approximate  SCS  model  provides  reasonable  results  with  small 
computational  requirements.  The  approximate  model  computation  times  are  on  the  order  of  a 
few  seconds  vs.  minutes,  hours,  or  even  days  for  the  exact  model.  The  throughput  values 
computed  by  the  approximate  model  are  always  less  than  or  equal  to  the  exact  values  and  on  the 
average  20%  less,  ±6%.  While  the  mean  queue  length  underestimates  the  exact  value  for  low 
pSPR  and  overestimates  it  for  high  values  of  pSPR,  the  average  is  approximately  12%  to  15%,  ±14% 
,  with  the  greatest  variation  occuring  in  the  mid-range  of  pSPR. 
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V.   ANALYSIS  OF  MODULAR  EXPANSION 


A.   Exact  Analysis 


In  chapter  III,  a  queueing  network  model  was  developed  for  an  architecture  consisting  of 
independent  computing  systems  (ISCs)  sharing  a  single  device.  In  chapter  IV,  a  much  less 
complex  approximate  model  for  this  type  of  architecture  was  introduced.  Of  interest  for  this 
architecture  is  the  effect  when  the  system  is  incrementally  expanded  by  the  addition  of  ICSs. 
Expansion  of  this  type  places  a  heavier  load  on  the  shared  device,  causing  degraded  service  to  each 
ICS.  This  introduces  a  dual  problem.  First,  for  a  given  configuration  and  a  specific  expansion, 
what  is  the  degradation  in  service  that  results?  This  can  be  determined  by  using  either  of  the 
models  to  compute  any  of  the  previously  discussed  performance  measures  for  both  the  before  and 
after  cases.  By  comparing  these  measures  against  each  other,  as  well  as  the  requirements  of  the 
facility,  one  may  determine  if  the  degradation  is  significant  and  acceptable. 

The  second  problem  occurs  when  it  is  determined  that  the  degradation  is  not  acceptable.  The 
alternatives  then  are  to  either  forego  the  expansion  or  augment  the  shared  device  to  increase  its 
processing  rate.  The  problem  is  then  one  of  determining  the  amount  by  which  the  processing  rate 
of  the  shared  device  must  be  increased  to  maintain  the  current  level  of  service  being  delivered  to 
each  of  the  ICSs. 

Both  the  exact  and  approximate  models  are  used  to  develop  corresponding  relationships 
between  adding  ICSs  and  increasing  the  processing  rate  of  the  shared  device.  A  before  and  after 
comparison  of  a  response  performance  measure  for  a  modular  expansion  of  a  balanced  system  is 
considered.   The  performance  measure  of  interest  here  is  the  mean  cycle  time.  This  is  the  mean 
time  of  a  renewal  interval.  This  interval  begins  when  a  job  enters  the  CPU  queue,  and  terminates 
when  that  same  job  next  enters  the  CPU  queue  again.  This  is  a  measure  of  the  average  time  spent 
at  each  device  (both  waiting  for  and  being  processed),  weighted  by  the  probability  of  visiting  that 
device.  Each  job,  in  general,  will  require  many  different  cycles  to  complete  its  processing  task,  of 
concern  here  is  the  mean  time  for  this  performance  measure. 
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From  Little's  result  (W=Q/a),  the  mean  wait  time  (W)  spent  at  a  device  (in  queue  and 
processing)  can  be  determined  if  the  mean  queue  length  (Q)  and  the  mean  arrival  rate  (a)  of  that 
device  are  known.  The  mean  queue  lengths  for  the  exact  SCS  model,  from  (40)  and  (41)  of 
chapter  III,  are 


Ji 

(1)  QH     =         1         I      xHkG(J-kd)         .    for  ij*0   ,  and 

G(J)      k=i 

K  R 

(2)  Q0      =    _i_     2     k!   k   {        2         n        h(r;Jr-nr0)}. 

G(J)       k=l  R  r=l 

2ni0  =  k 
i  =  l 


The  departure  rate  of  a  server  whose  service  process  is  exponential  is  equal  to  its  arrival  rate 
[BURKE  56,  FINCH  59,BURKE  72,  MUNTZ  73,  KLIENR  761.  As  noted  before,  the  throughput 
performance  measure  is  actually  the  device  departure  rate,  and  from  (43)  and  (44)  of  chapter  III  is 


(3)  TH     =  un   AH       =     GCr-djL    en  ,        for  i,j*0         ,  and 


R 

n  giCJj) 


i  =  l 


(4)  T0    =  uoA0        =  u0  {  1  -    } 

G(J) 


Therefore,  an  expression  for  the  wait  time  at  a  device  is 

(5)  Wjj     =  Qy  /Ty  ,  for  i>0  and  j  >  0     . 

From  this  an  expression  can  be  formulated  for  the  mean  cycle  time  of  a  job  assigned  to  an 
ICSas 
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si 


W.  =   S    p. .   W. .  ,  for  i  >  0  and  j  >  0 

j=0 


(6) 


si 


j=0 


S! 


=  Pi,0Qo/To    +  PuQu/Tu   +  2  Pu  Qy  /Tij 

j  =  2 


_   Pi.SPR  QsPR  ^SPR      +   Pi,CPU  Qi.CPU  /Ti,CPU     +    2     Pij  Qij  /Tij 

Given  an  SCS  system  consisting  of  R   ICSs,  the  desire  is  at  least  to  maintain  the  same  mean  cycle 
time  after  a  modular  expansion  resulting  in  an  SCS  system  of  R'    ICSs.  It  is  assumed  that  this 
can  be  accomplished  by  increasing  the  mean  processing  rate  of  the  SPR,  and  further  that  this 
increase  can  be  expressed  as  some  multiplicative  factor  fi.  This  relation  can  be  expressed  using 
the  mean  cycle  time  performance  measure  as 


W^u^R')     <    WjCugpR,  R)      ,  or 

(7) 


si 


Pi,sPR  Qspr(£  uspr'  r'  )  /tspr(#  uspR'  r')   +  Pi,cpu  Qlcpu  ^.cpu  +  2    Pij  Qij  'Ty 

j=2 


si 


^     Pi.SPR  QsPR  (  USPR'  R  )/TSPR^  USPR'  R  )      +  Pi.CPU  Qi.CPU  ^i.CPU     +  2     Pij  Qij  ^i j 

j=2 

where 

R'     >  R    >      1  ,  and 

(8a)  fi     =    a   R7R  ,      or 
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(8b)  £     =1  +    a   (R'/R  -  1) 

By  successfully  increasing  the  SPR  processing  rate  to  handle  the  incremental  load  of  R'-R 
additional  ICSs,  it  can  be  assumed  that  the  wait  at  each  device  within  each  ICS  remains  the  same. 
This  results  in  (7)  reducing  to  the  wait  at  the  SPR  only,  which  is 

Pi,SPR  QsPR^  USPR>  R'  )  ^SPR^  USPR'  R'  )      ^      Pi.SPR  ^SPR  <  USPR'  R  ^SPR^  USPR'  R  ) 

(9) 

QSPR^  USPR'  R'  )  /TSPR^  USPR'  R'  )      ^        QsPR  <  USPR'  R  )/TSPr(  USPR'  R  ) 


By  reformulating  (4)  and  substituting  equation  (31)  of  Chapter  III  we  obtain 


R 

n    gjdi) 

i  =  l 

TSPR     =   USPRASPR  =   USPR    i   l  '  J 

G(J) 


K  R 

(10)  =uSPRJ_     2     k!      {        2  n        h(r;Jr-nr0)} 

G(J)       k  =  l  R  r=l 

2n-  Q=k 
i=l' 


Substituting  (2)  and  (10)  into  (9)  a  complete  expression  for  the  inequality  is  obtained  of  the 
form 
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K'  R' 

_J 2     k'!    k'    {        X  n         h(r;Jr-nr0)  } 

G(J')      k'=l  R'  r=l 


Eni0  =  k 
i  =  l 


K'  R' 

ft  uspR  _1_     2     k!     {        I  n        h(r;Jr-nr0)} 

G(J')       k*=l  R'  r=l 

2ni0=k' 
i=l 


K  R 

J_       2     k!   k   {        2         n        h(r;Jr-nr0)  } 


G(J)       k  =  l  R  r=l 

i  =  l 


Sni,0=k 


< 


K  R 

J^     I     k!     {        I  n        h(r;Jr-nr0)} 

G(J)       k  =  l  R  r=l 


USPR 


Sn.0=k 
i=l 


Expanding  the    h    functions  in  the  above  expression,  from  their  definition  in  chapter  III.B 
between  equations  (14)  and  (15),  results  in 


K'  R'      X'r/r,0 

_±_    i    kM  k'  {      x       n gr(Vnr,o)} 

G(J')      k'  =  i  R'  r=l     nr0! 


2ni(0=k' 

i  =  l 


< 


K'  R<        K>\0 


fi  uSPR  _J_     I      k'  !     {        I  n gr(Jr-nr  0 )  } 

G(J')      k'=l  r'  r=i     nr0! 


2ni0  =  k 

i  =  l 
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K  R  X  AO 


{         2  H     gr(Vnr,0)} 


1         2     k!  k 
G(J)      k=i  R  r=i       nr0! 

Sni,0=k 
i=l 

K  R         Xr0nr,0 

USPR        _1_      I      k!      {         2  n    gr(Vnr,0)} 

G(J)     k=l  R  r=l       nn ! 


r,0 

)=k 

i=l 


2ni,0=k 


vhere 

K'  =  R'  J. 

K  =  RJj 

) 

J'  =(Jls 

...    ,Jr.)          , 

J    =(Jl5 

...  .  Jr)         . 

J.    =  J. 

i         J 

,   for  all 

iandj 

X'  A.O      = 
Ar,0 

(  xr//2)\0 

and 


Cancelling  similar  terms,  cross  multiplying,  and  moving  everything  to  the  left  hand  side  of  the 
inequality  results  in 


K  R        Xr/r,0 


{      2       n  gr(jr-nr  n )  } 


1     k! 

k=l  R         r=l       nrQ! 

2ni0  =  k 
i=l 


(ID 


K'  R'      X'  AO 


fi      2     k'!     {        Z  n gr(Jr-nr,0)} 


k'  =  l  R'  r=l      nrQ! 

2nij0=k' 
i  =  l 
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K 

{ 

R        X  „  r.0 

2     k! 

k 

I 

n 

k  =  l 

R 

Sni>0=] 
i=l 

r=1        nr,0! 
c 

K' 

R'     x'  A.0 
K      A  r,0 

2     k' ! 

:  k' 

{ 

2 

n 

k'  =  l 

R' 
2ni,0  = 

r-1    nr0! 

=  k' 

Sr(Vnr,0)  } 

Jr(Jr-nri0)  } 

r-l    nr0! 

0=k' 

i  =  l 

Inspecting  (11)  we  notice  that  all  terms  may  be  moved  into  the  innermost  summation.  Noting  the 
similarity  of  the  summation  in  both  numerators  and  denomenators  a  simplifying  notation  is 
introduced.  Let 


R       Xr/r,0 


bk       =       {     2       k!       n  8r(Vnr,0)}  >and 


R  r=l     nr0! 

2ni,0  =  k 
i=l 


R'      X'r>0nr,0 

b«k.    =    {  2     k'!    n  gr(Vn,o)} 

R'  r-l     nr0! 

Sni0=k' 
i  =  l 


Substituting  this  notation  into  (11)  yields 
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K 

2     bk 
k=l 

K 

2     kbk 

k  =  l 

K' 
fi     2     b'k, 
k'  =  l 

K' 

2      k'   b'k, 

k'  =  l 

Separating  the  common  outer  summation  term  results  in 


K 


(12)         2     bk     [ ]     < 

k=l  K'  K' 

P     2     b'k,  2     k'  b'k, 

k'  =  l  k'  =  l 


The  bk's  of  this  inequality  are  always  positive.  Therefore,  for  the  inequality  to  be  satisfied 
requires  the  inner  term  to  act  as  a  weighting  function  and  force  the  entire  expression  to  be  non- 
positive.  Although  a  solution  may  exist  for  /?  which  will  satisfy  the  equality,  no  obvious  method  of 
obtaining  it  is  apparent.  In  an  attempt  to  satisfy  the  inequality  and  obtain  a  lower  bound  for  fi  we 
will  investigate  the  situation  when  the  the  inner  term  is  always  <    0.  By  inspection  of  (12)  we 
notice  that  within  the  inner  term,  except  for  k  in  the  numerator,  the  other  terms  are  independent 
of  k.  As  a  result  this  inner  term  achieves  its  maximum  value  at  the  minimum  value  of  k,  which  is 
1.  Ifthistermis  <    0  for  its  maximum  value,  it  is  <    0  for  all  values  of  k,  and  the  inequality  is 
satisfied.  This  leaves  the  following  relation  for  a  lower  bound  solution  for  /J : 


k'  k' 

0     2     b'k.  2     k'  b'k, 

k'  =  l  k'  =  l 


]  < 


By  rearranging  the  above  we  obtain 
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K'  K' 

[   2      k'   b'k,    -     fi    2    b'k.     ]     < 


k'  =  l  k'  =  l 


Repeating  the  previously  applied  separation  process  yields 


K' 

I    b'k>       [  k'  —  0    ]    <      0 

k'  =  l 


Again  applying  our  previous  arguments  we  obtain  a  further  lower  bound  solution  to  the 
expression,  since  b'k  is  always  positive.  This  inner  term  achieves  its  maximum  value  at  the 
maximum  value  of  k\  which  is  K'  =  R'  i^ .  Therefore,  the  inequality  is  always  satisfied  if  it  is 
satisfied  for  k'  =  K'.  This  yields  the  following  lower  bound  solution  for  /? : 

p  =  K'  =  R'  J. 

Solving  for  a    by   substituting  (8a)  into  the  above  yields   a    =  R  Jj   . 

The  interpretation  of  this  result  implies  that  by  increasing  the  processing  rate  of  the  SPR  by  a 
factor  commensurate  with  the  total  resulting  number  of  jobs  in  the  entire  system  one  will  be 
assured  no  degradation  in  response  occurs  as  compared  to  the  response  prior  to  the  expansion. 
Unfortunately  this  is  such  an  extremely  high  lower  bound  that  the  result  is  not  very  useful. 

In  retrospect,  based  on  the  results  derived  in  the  next  section,  if  one  repeats  this  procedure 
and  differs  only  by  substituting  in  (12)  the  maximum  (K  =  RJ)  rather  than  its  minimum  (1)  value 
of  k,  the  result  obtained  is  : 

p  =  R7R     and  from  either  (8a)  or  (8b)    a  =  1  . 
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This  result  is  much  more  intuitively  appealing  due  to  its  linear  one-to-one  relation,  but  its 
derivation  cannot  be  substantiated  from  the  above  equations.  Although  bk<bk+1  for  all  k  with 
bk  increasing  factorially,  and  the  inner  term  of  (12)  is  linearly  increasing  in  the  negative  direction 
with  k ,  this  is  not  sufficient  to  conclude  that 


K  K-l 


>J ]  >  *  bkf- 


K'  K'  k  =  l  K'  K' 

P   2     b'k.         2     k'  b'k,  fi  2     b'k,  2     k'  b'k 

k'  =  l  k'  =  l  k'=l  k'  =  l 


B.  Approximate  Analysis 


Analysis  of  the  expression  for  mean  cycle  time  of  the  exact  SCS  model  has  resulted  in  a 
dissappointingly  high  lower  bound  for  /?.  In  this  section  the  approximate  SCS  model  developed  in 
chapter  IV  is  used  to  perform  a  similar  analysis. 

We  can  immediately  write  a  similar  expression  for  (6),  from  (9)  and  (11)  of  chapter  IV,  as 


si 
Wi   ~  eSPR-  WSPR   +   WCPUf    +    2      ePPU;  jWPPU- : 

1  *  »»J  ^jj 

j=2 


^PRj^SPR               QcPUj  si       ^UyQpPUy 

(13)  =  +  +    2 


TSPR  aCPU-  J-2  CTUij 

1  1J 


{  QSPR/R     +   QCPU.     +     *    QPPU:     } 


;SpR/iv      -r    VCPU-     T     *■    vppu 

1  1J 

aCPUj  i=2 
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The  intent  is  to  maintain  the  mean  cycle  time  of  a  job  after  a  modular  expansion.  Therefore,  a 
relation  similar  to  (7)  can  be  established  as 


W.08uSpR,R')   <    Wj(uSPR,R)           ,  or 
(14) 

si  si 

_i_  {Q'SpR/R'  +  QCPU.  +  2  QPPU.  }  <  __L  {Qspr/R  +  Qcpu.  +  2  Qppu-  .  } 

aCPU-                                          j  =  2                        aCPU-  j  =  2 


Assuming  that  the  processing  rate  of  the  SPR  is  successfully  increased  to  satisfy  the  above 
inequality,  it  can  be  assumed  that  the  wait  at  each  device  within  each  ICS  remains  the  same.  This 
further  implies  that  a^y  also  remains  the  same.  This  results  in  reducing  (14)  to 

-1-  {  Q'SPR/R'    }       <       — i-  {  QSPR/R    } 


a  CPUi  aCPUj 


(15)  Q'spr/r'      ^      Qspr/r 


1/R' <  1/R 


(fi  uSpR/R'  pSpRi  aQ,^  - 1)  (  uSPR/R  pSPRi  acpu   - 1) 

By  inspection  of  (15)  one  can  determine  that  by  maintaining  a  constant  ratio  of 

/?  UsPR/R'  =  C  ,   where  C  =  uSPR/R 

results  in 

P  =  R'/R     and  from  (8a)       a  =  1     , 

which  satisfies  the  inequality  and  actually  improves  the  cycle  time  rather  than  just  maintaining  it. 
This  was  conjectured,  but  not  proven,  at  the  end  of  the  the  preceeding  section.  Maintaining  the 
assumption  that  acpu  remains  the  same,  the  equality  of  (15)  is  solved  for  the  required  fi. 
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R  (  uspR/R  pSpR.  acpUi    -   1)      =     R'  {fi  uspR/R'  pspR_  acpu.     -   1) 


(/?  -  1)     =    (pSPR.  acpu.  /uspR)  (R'  -  R) 


Noting  that  pSpR  =  R  pSpR  acpu  /uSPR    and  subtituting  it  into  the  above  yields 


i         i 


08  -  1)     =   (pSPR/R)  (R'  -  R)  . 
This  may  also  be  expressed  as 

0=1  +  (PSPR/R>  (R'  "    R) 
(16) 

=  1  +  pSpR  (R'/R   -   1) 

From  this,  and  (8b),     a    =  pSPR  is  obtained. 

To  evaluate  the  accuracy  of  the  wait  time  performance  measures  and,  therefore,  the  accuracy 
of  this  modular  expansion  result,  we  present  an  error  analysis  similar  to  that  of  chapter  IV.  The 
relative  error  scatter  and  mean  value  plots  of  the  CPU  and  SPR  mean  wait  time  are  presented  in 
Figures  V-l  and  V-2  for  the  entire  sample  space.  The  relative  error  statistics  for  this  performance 
measure  are  listed  by  parameter  in  Tables  V-l  and  V-2.  Representative  comparison  curves  for  the 
exact  and  approximate  models  are  plotted  in  Figures  V-3  and  V-4. 

As  can  be  seen  from  Tables  V-l  and  V-2,  this  performance  measure  exhibits  some  parameter 
sensitivity.   Some  sensitivity  is  indicated  for  the  s  parameter.  The  CPU  indicates  a  decreasing 
error  as  s  increases,  while  the  SPR  show  an  opposite  response.   This  sensitivity  is  attributed  to  the 
large  number  of  PPUs  spawned  by  increasing  s,  resulting  in  a  greater  portion  of  the  workload 
begin  handled  by  the  PPUs,  decreasing  the  SPR  and  CPU  workload. 

Both  devices  demonstrate  a  sensitivity  to  the  pSPR  parameter.  The  CPU  does  not  exhibit  any 
trend,  while  the  SPR  shows  an  increasing  error  trend  with  increasing  PSPR-   Both  devices  exhibit  a 
decreasing  error  as  the  J  parameter  increases.   Both  devices  demonstrate  an  opposite  sensitivity 
trend  to  the  uSpR  and  the  R  parameters.  The  CPU  shows  an  increasing  mean  error, 
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Points  Mean  Variance  Minimum  Maximum 

1650/1158      -0.0795/-0.1001       0.0245/0.0309      -0.6388/-0.6388       0.3056/0.1515 


PSPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.10 

330/  266 

-0.0739/-0.0857 

0.0203/  0.0238 

-0.4489/-0.4489 

0.0798/  0.0798 

0.25 

330/  260 

-0.0686/-0.0778 

0.0247/  0.0297 

-0.5893/-0.5893 

0.0787/  0.0787 

0.50 

330/  208 

-0.1526/-0.2246 

0.0378/  0.0390 

-0.5647/-0.5647 

0.3056/  0.0777 

0.75 

330/  227 

-0.0000/  0.0075 

0.0026/  0.0021 

-0.1042/-0.0947 

0.2500/  0.0970 

0.90 

330/  197 

-0.1025/-0.1416 

0.0253/  0.0352 

-0.6388/-0.6388 

0.1515/0.1515 

J 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

600/  458 

-0.1169/-0.1447 

0.0335/  0.0387 

-0.6388/-0.6388 

0.3056/  0.1515 

4.00 

600/  411 

-0.0682/-0.0818 

0.0206/  0.0264 

-0.5352/-0.5074 

0.3056/  0.0685 

6.00 

450/  289 

-0.0448/-0.0555 

0.0147/  0.0196 

-0.4723/-0.3813 

0.0956/  0.0956 

s 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

550/  387 

-0.1000/-0.1275 

0.0287/  0.0353 

-0.6388/-0.6388 

0.3056/  0.1515 

6.00 

550/  389 

-0.0420/-0.0492 

0.0138/  0.0175 

-0.4914/-0.4914 

0.3056/  0.1515 

11.00 

550/  382 

-0.0966/-0.1243 

0.0291/  0.0365 

-0.5893/-0.5893 

0.3056/  0.1515 

R 

Points 

Mean 

Variance 

Minimum 

Maximum 

1.00 

450/  450 

-0.0786/-0.0786 

0.0271/  0.0271 

-0.5814/-0.5814 

0.0956/  0.0956 

2.00 

450/  383 

-0.0787/-0.0927 

0.0248/  0.0277 

-0.5811/-0.5811 

0.1515/0.1515 

5.00 

450/  213 

-0.0799/-0.1287 

0.0215/  0.0354 

-0.6239/-0.6239 

0.2500/  0.0816 

8.00 

300/  112 

-0.0815/-0.1575 

0.0250/0.0431 

-0.6388/-0.6388 

0.3056/  0.0670 

USPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.01 

165/  76 

0.0089/  0.0133 

0.0069/  0.0017 

-0.1111/-0.0622 

0.3056/  0.1515 

0.02 

165/  79 

-0.0152/-0.0008 

0.0011/  0.0005 

-0.1042/-0.0617 

0.0426/  0.0402 

0.05 

165/  85 

-0.0084/-0.0003 

0.0007/  0.0006 

-0.1044/-0.1044 

0.0798/  0.0798 

0.10 

165/  92 

-0.0123/-0.0106 

0.0018/  0.0029 

-0.2889/-0.2889 

0.0787/  0.0787 

0.20 

165/  100 

-0.0257/-0.0292 

0.0056/  0.0087 

-0.4051/-0.4051 

0.0777/  0.0777 

0.50 

165/  120 

-0.0745/-0.0856 

0.0159/  0.0203 

-0.5105/-0.5105 

0.0956/  0.0956 

1.00 

165/  131 

-0.1288/-0.1389 

0.0272/  0.0320 

-0.5644/-0.5644 

0.0771/0.0771 

2.00 

165/  147 

-0.1721/-0.1691 

0.0390/  0.0422 

-0.5893/-0.5893 

0.0805/  0.0805 

5.00 

165/  163 

-0.1876/-0.1840 

0.0466/  0.0461 

-0.6388/-0.6388 

0.0828/  0.0828 

10.00 

165/  165 

-0.1797/-0.1797 

0.0445/  0.0445 

-0.5898/-0.5898 

0.0819/0.0819 

Table  V-l.  Relative  Error  statistics  for  Wcpu ,  for  both  all  p/p  <  .90  . 
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p 

Points 

Mean 

Variance 

Minimum 

Maximum 

— 

1650/1158 

-0.0621/-0.0656 

0.0248/  0.0346 

-0.5974/-0.5974 

0.4989/  0.4989 

PSPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.10 

330/  266 

-0.0367/-0.0325 

0.0188/  0.0227 

-0.5974/-0.5974 

0.3547/  0.3547 

0.25 

330/  260 

-0.0299/-0.0247 

0.0241/  0.0298 

-0.5810/-0.5810 

0.4402/  0.4402 

0.50 

330/  208 

-0.0899/-0.1105 

0.0239/  0.0358 

-0.5930/-0.5930 

0.3320/  0.3320 

0.75 

330/  227 

-0.0604/-0.0634 

0.0266/  0.0382 

-0.5774/-0.5774 

0.4989/  0.4989 

0.90 

330/  197 

-0.0936/-0.1194 

0.0275/  0.0436 

-0.5916/-0.5916 

0.4262/  0.4262 

J 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

600/  458 

-0.1116/-0.1206 

0.0370/  0.0480 

-0.5974/-0.5974 

0.3547/  0.3547 

4.00 

600/411 

-0.0367/-0.0378 

0.0187/  0.0267 

-0.3979/-0.3979 

0.4989/  0.4989 

6.00 

450/  289 

-0.0299/-0.0178 

0.0118/  0.0166 

-0.3018/-0.3018 

0.4262/  0.4262 

s 

Points 

Mean 

Variance 

Minimum 

Maximum 

2.00 

550/  387 

-0.0518/-0.0515 

0.0265/  0.0368 

-0.5811/-0.5811 

0.4262/  0.4262 

6.00 

550/  389 

-0.0651/-0.0688 

0.0235/  0.0327 

-0.5930/-0.5930 

0.4072/  0.4072 

11.00 

550/  382 

-0.0693/-0.0766 

0.0244/  0.0342 

-0.5974/-0.5974 

0.4989/  0.4989 

R 

Points 

Mean 

Variance 

Minimum 

Maximum 

1.00 

450/  450 

-0.1553/-0.1553 

0.0401/  0.0401 

-0.5974/-0.5974 

0.3714/  0.3714 

2.00 

450/  383 

-0.0767/-0.0738 

0.0172/  0.0201 

-0.3964/-0.3964 

0.3759/  0.3759 

5.00 

450/  213 

-0.0055/  0.0462 

0.0100/  0.0142 

-0.2500/-0.2500 

0.4262/  0.4262 

8.00 

300/  112 

0.0147/  0.1103 

0.0117/  0.0140 

-0.0946/-0.0730 

0.4989/  0.4989 

USPR 

Points 

Mean 

Variance 

Minimum 

Maximum 

0.01 

165/  76 

-0.1575/-0.2698 

0.0208/  0.0209 

-0.5974/-0.5974 

-0.0308/-0.0549 

0.02 

165/  79 

-0.1517/-0.2512 

0.0223/  0.0268 

-0.5811/-0.5811 

0.1923/  0.1923 

0.05 

165/  85 

-0.1346/-0.2062 

0.0283/  0.0432 

-0.5797/-0.5797 

0.4110/  0.4110 

0.10 

165/  92 

-0.1225/-0.1717 

0.0278/  0.0438 

-0.5930/-0.5930 

0.3184/  0.3184 

0.20 

165/  100 

-0.1049/-0.1359 

0.0279/  0.0428 

-0.5916/-0.5916 

0.3126/  0.3126 

0.50 

165/  120 

-0.0590/-0.0623 

0.0215/  0.0288 

-0.5223/-0.5223 

0.4072/  0.4072 

1.00 

165/  131 

-0.0067/  0.0001 

0.0115/0.0137 

-0.3395/-0.3395 

0.4989/  0.4989 

2.00 

165/  147 

0.0395/  0.0443 

0.0092/  0.0094 

-0.2571/-0.2571 

0.4402/  0.4402 

5.00 

165/  163 

0.0567/  0.0553 

0.0083/  0.0083 

-0.1921/-0.1921 

0.4262/  0.4262 

10.00 

165/  165 

0.0197/  0.0197 

0.0097/  0.0097 

-0.5172/-0.5172 

0.3759/  0.3759 

Table  V-2.  Relative  Error 

statistics  for  Wsp) 

. ,  for  both  all  p/p 

<.90. 
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while  the  SPR  shows  a  decreasing  error  for  increasing  R.  For  increasing  uspR,  the  CPU  shows  an 
increasing  error,  while  the  SPR  first  shows  a  decreasing  error  and  then  as  uSpR  gets  large,  it  tends 
to  increase.  Therefore,  the  wait  time  performance  measure  exhibits  the  greatest  sensitivity  of  all 
the  performance  measures.  Although  wait  time  possesses  the  lowest  mean  relative  error,  it  also 
possesses  the  highest  variance,  and  demonstrates  sensitivity  to  all  parameters.  This  is  reasonable 
to  expect  since  the  wait  time  is  a  function  of  the  other  two  performance  measures  and,  therefore, 
compounds  their  errors.    In  this  case  the  variance  was  amplified  while  the  mean  was  attenuated. 

The  interpretation  of  (16)  implies  that  by  increasing  the  processing  rate  of  the  SPR  by  p$pR/R 
for  each  additional  ICS  in  the  resultant  expanded  system,  the  mean  response  time  would  be 
preserved  when  compared  to  that  prior  to  the  expansion.  It  is  noted  that  pSpR  <  1  and,  therefore, 
pSPR/R  <  1.  This  means  that  the  incremental  increase  in  SPR  processing  rate  (per  additional  ICS), 
pSPR/R,  is  only  a  fractional  increase.  This  is  a  significant  improvement  over  the  upper  bound 
derived  in  the  previous  section.  In  fact  this  is  an  improvement  over  the  incremental  increase  of 
unity  conjectured  in  the  preceding  section. 

The  unity  increase  in  processing  rate  is  quite  intuitively  appealing.  For  each  additional  ICS 
added,  the  SPR  processing  rate  should  increase  by  1/R  from  the  current  processing  rate.  The 
newly  derived  incemental  increase  of  pSPR/R  seems  intuitively  correct.  Condider  the  fact  pSpR 
represents  the  current  total  traffic  intensity  to  the  SPR,  p$PR/R  represents  the  portion  of  that 
traffic  intensity  generated  by  an  ICS.  Utilizing  this  perspective  of  traffic  intensity,  this  fractional 
increase  seems  more  credible.  Since  each  ICS  generates  traffic  proportional  to  pSPR/R,  each 
additional  ICS  would  genarate  additional  traffic  also  proportional  to  p$PR/R.  Therefore,  if  the 
processing  rate  of  the  SPR  is  increased  by  that  fractional  increase  in  traffic,  the  response  time 
should  not  increase. 

There  is  an  additional  implication  to  this  fractional  increase  in  SPR  processing  rate, 
proportional  to  the  increase  in  ICSs.  A  system  of  this  class  may  undergo  one  or  more  modular 
expansions,  encompassing  a  large  increase  in  the  number  of  ICSs.  The  extent  of  expansion  will  in 
general  be  limited  technically  by  the  maximum  attainable  processing  rate  of  the  SPR,  which  is 
dependent,  for  the  most  part,  on  the  existing  technology  for  that  type  of  device.  The  processing 
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rate  of  applicable  devices  will  generally  span  several  orders  of  magnitude  and  may  include  a 
number  of  different  technologies.  Therefore,  depending  on  a  fractional  processing  rate  increase, 
rather  than  a  unity  or  greater  increase,  will  allow  a  given  technology  to  support  a  larger  range  of 
expansion.  This  minimizes,  or  at  least  delays,  the  implementaion  and  investment  risk  of  changing 
device  technology  in  a  given  system.  Additionally  it  will  also  allow  for  a  far  greater  maximum 
expansion  range,  since  the  overall  existing  technology  limit  will  be  approached  at  a  much  slower 
rate. 


C.   Applications 


Two  examples  of  the  SPR  architecture  are  presented  to  illustrate  the  utility  of  the 
approximate  SCS  model.   The  first  example  is  a  complex  of  multiple  minicomputers  linked  to  a 
common  shared  secondary  memory  subsystem  by  a  local  area  network  (LAN).  This  is  applicable 
to  engineering  and  scientific  environments.  The  second  example  is  a  point-of-sales  (POS) 
application.  This  is  applicable  to  grocery  and  department  stores,  and  has  a  direct  analogy  to 
certain  office  automation  environments. 


Example  1 


Suppose  The  current  processing  system  of  a  technical  organization  consists  of  2 
minicomputers  (ICSs);  each  has  an  average  multiprogramming  level  of  eight,  and  both  share  a 
common  secondary  storage  subsystem.  The  minicomputers  are  identical  and  each  has  a  CPU  and 
the  same  complement  of  four  peripheral  devices  (PPU).  These  PPUs  consist  of  (1)  an  input  card 
reader  (CR),  (2)  an  output  line  printer  (LP),  (3)  a  private  local  disk  (disk),  and  (4)  a  set  of 
interactive  devices  (TTY). 
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Operationally,  each  ICS  functions  independently,  processing  both  batch  and  interactive  jobs. 
The  composition  of  both  types  of  jobs  is  the  same,  so  no  distinction  between  them  is  necessary. 
The  set  of  interactive  devices,  as  a  set,  has  been  characterized  as  a  single  device  with  an 
exponential  service  time  distribution.   Therefore,  we  may  aggregate  these  interactive  devices  and 
represent  them  as  a  single  device  in  our  model.  The  local  disk  has  two  functions,  (1)  The  system 
software  resides  there,  and  (2)  during  processing  of  a  job  it  acts  as  a  cache  between  the  ICS  and 
the  SPR. 

The  jobs  exhibits  an  exponential  service  time  distribution  at  each  device.  The  CPU  mean 
service  time  of  a  job  is  25  msec,  which  in  the  SCS  model  is  normalized  to  1.    For  each  device  the 
mean  service  time,  its  corresponding  normalized  value,  and  its  estimated  transition  probabilities 
are  as  follows: 


mean 

normalized 

estimated 

service 

service 

transition 

time 

time 

propabilities 

USPR  =  100  msec 
UCPU  =  ^^  msec 
udisk=  25  msec 

u0=  .25 
u1=  1.0 
u2=  1.0 

P0=  -25 
Pl=  .05 
P2=  -45 

utty  =  2-5  sec 
ULP  =  2^0  msec 

u3=  .01 
u4=  .10 

p3=  .05 

p4=  .10 

UCR  =  250  msec 

u5=  .10 

p5=  .05 

The  installation  is  about  to  be  modified.  The  organization  is  expanding  and  has  determined  a 
requirement  to  expand  the  processing  complex  by  eightfold.  They  have  decided  to  implement  a 
local  area  network  (LAN )  which  will  allow  interactive  device  access  to  the  central  processing 
complex  from  the  desk  of  each  employee.  The  bandwidth  of  the  LAN  is  sufficiently  high  so  that 
it  will  not  be  a  bottleneck  or  cause  any  significant  delay,  therefore,  the  LAN  may  be  neglected  in 
our  analysis.   Due  to  existing  software  investment  and  staff  familarity,  the  organization  plans  to 
retain  the  existing  two  minicomputers  and  modularly  expand  by  adding  identical  ones  as  required. 

The  shared  common  secondary  storage  subsystem  has  been  very  successful  for  sharing,  rather 
than  duplicating,  programs  and  data  bases  as  well  as  providing  an  effective  electronic  mail  system. 
Therefore,  the  retention  and  expansion  of  this  facility  is  also  planned.   The  organization  is 
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expected  to  grow  over  the  next  two  years  and  the  modular  expansion  of  the  processing  facilities  is 
planned  to  coincide.   The  maximum  planned  processing  expansion  is  for  a  complex  of  16 
minicomputers.   Two  minicomputers  are  currently  being  acquired  to  bring  the  complex  to  4 
minicomputers. 

The  current  SPR  does  not  have  sufficient  speed  or  capacity  to  handle  the  planned  expansion. 
It  is  desired  to  size  the  SPR  so  that  its  current  mean  response  time  is  maintained.   In  addition, 
there  is  potential  for  additional  future  organizational  growth,  which  may  result  in  a  further 
processing  expansion  to  32  minicomputers.  This  potential  growth  is  5  to  8  years  away  and  no 
definite  planning  is  currently  being  done.  It  is  desired  to  know  if  the  SPR  sized  for  the  16 ICS 
system  will  be  able  to  adequately  service  a  32  ICS  system,  and  if  not,  is  there  one  that  will. 

To  determine  the  approximate  job  flow  rate  and  traffic  intensity  of  the  current  system  we 
solve  (7)  of  chapter  IV  using  the  BIN  algorithm  and  the  system  parameters  listed  above.    These 
values  are  then  used  to  obtain  the  current  approximate  mean  SPR  response  time  by  applying  (13). 
This  results  in  pSpR  =  .3508  and  WSpR  =  6.16.    Repeating  this  procedure  for  the  4  ICS 
configuration  results  in  WSpR  =  13.2  .   Similarly,  the  planned  16  ICS  configuration  yields  WspR 
=  475,  and  the  hypothesized  32  ICS  configuration  yields  WSpR  =  992.    Applying  (14)  and  (16), 
using  the  previously  computed  value  of  pSPR  we  obtain  the  /Ts  from  which  the  processing  rate  of 
an  SPR  for  each  configuration  is  computed  to  be: 


USPR  <4  > 

=  .338 

«- 

74  msec 

j8(4)     =  1.35 

USPR  (16> 

=  .865 

«- 

29  msec 

0(16)    =  3.46 

USPR  (32) 

=  1.57 

«- 

16  msec 

0(32)    =  6.26 

and 


Processing  rates  for  various  secondary  storage  technologies  [HOAGLA  79,  TOOMBS  78, 
WARNAR  79]  indicate  that  an  SPR  subsystem  using  existing  rotating  disk  technology  can  support 
a  16  ICS  configuration.   This  same  SPR  subsystem  cannot  adequately  support  a  32  ICS 
configuration,  although  a  faster  SPR  subsystem  using  this  same  technology  can  support  a  32  ICS 
configuration. 
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Example  2 


For  the  second  example  a  POS  environment  is  considered.  POS  can  generally  be  described  as 
many  small  independent  processors,  each  processing  a  single  job,  and  accessing  a  shared  inventory 
data  base  subsystem  (IDBS).  The  inventory  data  base  may  be  implemented  in  several  different 
configurations.  One  is  a  distributed  configuration,  where  each  geographically  distinct 
organizational  unit  has  its  own  local  IDBS,  which  is  shared  among  its  own  POS  stations.  Another 
configuration  is  a  centralized  one,  where  a  single  IDBS  is  located  at  a  single  site  and  is  shared  by 
all  the  POS  stations.  There  are  of  course  a  spectrum  of  configurations  in  between  these  two 
extremes.  The  centralized  configuration  requires  additional  communication  facilities  from  the 
IDBS  central  site  to  each  of  the  geographically  distinct  organizational  units. 

Structurally  each  POS  station  consists  of  a  processor  and  a  number  of  local  I/O  devices. 
These  I/O  devices  may  include  any  of  the  following  typical  devices;  (1)  a  digital  display  or  two,  (2) 
a  printer  for  a  sales  receipt,  (3)  an  input  scanner,  (4)  an  input  alphanumeric  keyboard,  and  (5)  an 
auxiliary  input  device,  for  instance  a  scale. 

A  normal  transaction  consists  of  one  or  more  human  interactions  to  enter  data  through  the 
I/O  devices.  The  station  processor,  once  it  accepts  the  data,  requests  service  from  the  IDBS  to 
process  this  data.  This  request  is  serviced  on  a  FCFS  basis  by  the  IDBS  with  an  exponentially 
distributed  service  time.  Any  time  devoted  to  communication  between  a  POS  station  and  the 
IDBS  will  be  incorporated  into  the  IDBS  service  time.  It  is  assumed  that  the  communication 
subsystem  is  not  a  bottleneck  and,  therefore,  this  time  may  be  accounted  for  by  increasing  the 
mean  processing  time  of  the  IDBS.  The  processed  data  is  returned  to  the  POS  station,  which  has 
been  idle  while  waiting,  and  it  is  then  displayed.  The  cycle  is  then  repeated. 

There  is  only  a  single  job  at  each  station  and  the  station  processor  is  fast  enough  to  service  all 
of  its  requests  and  manage  all  of  its  devices.  Due  to  the  nature  of  service  requests  the  service  time 
of  the  station  processor  is  generally  constant  for  local  operations,  while  the  human  interaction 
through  the  devices  is  random  and  is  assumed  to  be  exponentially  distributed.    For  a  single  job 
there  is  no  need  to  model  each  device  within  a  station  separately  since  there  is  no  contention  for 
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devices.  Also,  since  the  station  processor  has  essentially  a  constant  service  time  and  is  not  a 
bottleneck,  the  mean  time  of  the  human  interaction  distribution  can  be  increased  to  account  for  it. 
Therefore,  it  is  assumed  that  all  of  these  devices  and  the  station  processor  can  logically  be  thought 
of  as  a  single  device  with  an  exponentially  distributed  service  time.  A  feedback  loop  to  the  POS 
station  provides  for  error  conditions  that  arise,  which  mainly  occur  when  entering  data. 

The  processing  rate  of  the  POS  station  is  normalized  to  1.  Each  transaction  requires  one  or 
more  service  requests  at  the  POS  station  followed  by  a  single  service  request  to  the  IDBS.  Each 
additional  service  request  to  the  POS  station  represents  a  re-entry  of  data  which  was  required  by 
an  error  on  the  previous  attempt.  The  response  as  seen  by  the  user  is  strictly  a  local  one. 
Therefore,  the  only  time  the  IDBS  affects  the  response  time  is  when  the  data  entry  was  successful. 
Based  on  this  descriptive  model,  an  acceptable  response  time  is  on  the  order  of  1/4  sec.  The  POS 
mean  processing  time  is  1  sec.  Therefore,  IDBSs  which  have  processing  rates  many  times  faster 
than  the  POS  station  must  be  considered.  The  range  of  potential  IDBS  processing  rates  is  5  to 
2000  times  faster  than  the  POS  station.  At  2000  times  faster,  the  IDBS  would  require  a  mean 
processing  time  of  .5  msec.  This  represents  the  upper  limit  of  current  secondary  storage 
subsystem  technology  [HOAGLA  79,  TOOMBS  78,  WARNAR  79],  especially  if  a  communication 
subsystem  is  involved. 

The  jobs  have  an  exponential  service  time  at  each  device.  The  POS  station  mean  service  time 
of  a  job  is  1  sec.    For  this  model  the  mean  service  times  and  their  estimated  transition 
probabilities  are  as  follows: 


mean  estimated 

service  transition 

time  probabilities 


UIDBS=  5-2000  PlDBS_=  -90 

UPOS~  1  PPOS=  ■*" 


It  is  desired  to  size  the  IDBS  for  the  two  configurations  under  consideration.  The  first 
configuration  being  considered  is  a  distributed  configuration  supporting  20  POS  stations.  The 
second  configuration  is  a  centralized  one  supporting  1000  POS  stations. 
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To  determine  the  approximate  job  flow  rate  and  traffic  intensity  of  a  20  POS  station 
configuration  we  solve  (7)  of  chapter  IV  using  the  BIN  algorithm  and  the  system  parameters  listed 
above.    These  values  are  then  used  to  obtain  the  corresponding  approximate  mean  IDBS 
response  time  by  applying  (13).   This  results  in  pIDBS  =.6797  and  WIDBS  =  .250  for 
uIDBS(20)=  12.5.    Applying  (14)  and  (16),  using  the  previously  computed  value  of  pIDBS  we 
obtain  the  fi  from  which  the  processing  rate  of  an  IDBS  to  support  a  1000  POS  station 
configuration  is  uIDBS  (1000 )  =  fi  uIDBS  =  428.9. 

The  local  20  POS  station  configuration  requires  a  uIDBS  =  80  msec,  which  is  a  reasonable 
speed  for  disk  technology.  The  1000  POS  station  configuration  requires  a  uIDBS=  2.33  msec.  For 
this  configuration  the  IDBS  is  remotely  located  and  at  this  speed  the  communications  delay, 
although  not  a  bottleneck,  must  be  accounted  for.  This  communications  delay  has  been  measured 
to  be  .75  msec,  resulting  in  a  IDBS  with  a  mean  processing  time  of  1.58  msec.  This  speed  exceeds 
the  current  capability  of  disk  technology,  but  can  be  met  using  magnetic  bubble  technology. 

An  analysis  of  a  modular  expansion  was  performed  using  both  models.  One  of  the  key  design 
aspects  is  the  effect  on  performance  due  to  a  modular  expansion,  or  conversely,  the  amount  of 
increased  capability  required  by  the  shared  resource  to  continue  to  deliver  some  threshold  amount 
of  performance.  The  exact  model  yielded  only  an  lower  bound  for  the  incremental  increase  in  the 
SPR  processing  rate  required  to  maintain  system  response  time.   This  lower  bound  was  too  high 
to  be  of  any  practical  use. 

The  analysis  of  the  approximate  model  yielded  a  useful  and  intuitively  satisfying  relation 
between  the  addition  of  ICSs  and  the  incremental  increase  in  SPR  processing  rate  required  to 
maintain  system  response  time.    It  indicated  that  for  each  ICS  added  to  expand  the  system,  the 
required  increase  in  incremental  SPR  processing  rate  was  directly  related  to  the  incremental  traffic 
intensity  caused  by  each  additional  ICS.  The  implication  is  that,  for  example,  by  doubling  the 
number  of  ICSs,  an  increase  in  the  SPR  processing  rate  of  <  2  is  required  to  maintain  system 
response  time,  since  the  traffic  intensity  for  a  stable  system  is  always  <  1.   This  result  is  verified 
when  compared  to  values  predicted  by  the  exact  model.  This  result  will  be  useful  to  designers 
and  analysts  when  they  consider  building  new  systems  or  augmenting  existing  systems  which  are 
based  on  this  class  of  architecture.   Two  examples  illustrating  the  utility  of  the  approximate  model 
were  presented. 
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VI.    SUMMARY  AND  RECOMMENDATIONS 

A.   Summary 

The  basis  of  resource  sharing  and  its  application  to  computer  architecture  has  been  discussed. 
Some  examples  of  architectures  that  support  resource  sharing  were  provided,  and  many  more  will 
be  constructed.   It  was  our  intention  to  investigate  the  performance  of  this  class  of  computer 
architecture  which  shares  a  single  processing  resource  among  multiple  independent  computing 
systems  through  the  use  of  analytic  queueing  models 

Utilizing  multi-class  queueing  network  theory  and  the  structure  of  this  class  of  computer 
architecture  a  specific  queueing  network  model  was  developed.  Two  efficient  computational 
algorithms,  SOP  and  FAC,  were  presented  which  could  be  used  to  evaluate  the  performance 
measures  of  this  model.   Previously  the  evaluation  of  queueing  network  models  required 
memory-space  and  time  complexity  both  growing  exponentially  with  the  size  of  the  state-space,  0[ 
(J.+ 1)R  ].  The  algorithms  developed  here  to  evaluate  the  model  require  a  memory-space  which 
grows  linearly,  0[  R(Jj  + 1)  ],  with  the  size  of  the  state-space,  although  the  time  complexity  still 
grows  exponentially.  This  provides  the  designer  and  analyst  the  ability  to  evaluate  this  model 
when  it  has  a  large  state-space  if  he  or  she  is  willing  to  invest  the  computation  time.  Whereas, 
previously  it  may  not  have  been  possible  to  evaluate  this  model  due  to  memory-space  limitations. 

Although  the  algorithms  to  evaluate  this  exact  queueing  network  model  are  memory-space 
efficient,  they  are  still  of  exponential  time  complexity.   This  computational  complexity  limits  the 
utility  of  the  model,  as  is  generally  the  case  for  other  multi-class  queueing  network  models.   In  an 
attempt  to  overcome  this  computational  limitation,  an  approximate  queueing  model  was 
introduced.   This  approximate  model  consists  of  a  set  of  independent  M/M/l  single  server 
queues.   The  solution  technique  is  based  on  approximating  the  job  flow  rate  between  these 
queues.   An  efficient  algorithm,  predicated  on  a  binary  search  technique,  was  presented  to 
evaluate  the  performance  measures  of  this  approximate  model.  The  development  and  solution 
form  presented  apply  to  balanced  as  well  as  unbalanced  systems.  The  solution  algorithm  and  the 
error  analysis  considered  only  balanced  systems 
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To  determine  the  utility  of  this  approximate  model  a  comparison  of  results  between  it  and  the 
exact  model  was  made.   A  random  sample  space  of  the  input  parameters  for  these  models  was 
generated,  and  the  corresponding  performance  measures  evaluated.   The  performance  measures 
predicted  by  this  approximate  model  do  result  in  a  varying  error,  which  is  considered  to  be  within 
ac  eptable  engineering  limits.  The  efficiency  gained  in  evaluation  of  the  performance  measures  is, 
for  most  applications,  thought  to  be  an  acceptable  compromise  for  the  error  incurred.  For 
situations  with  extremely  large  state-spaces,  it  may  be  the  only  analysis  method  possible.  As  a 
result,  designers  and  analysts  are  provided  the  capability  to  use  the  approximate  model  to  obtain 
estimates  of  the  performance  of  a  large  number  of  system  confiurations  in  a  very  short  period  of 
time.  Once  a  small  number  of  candidate  configurations  are  culled,  the  exact  model  may  be 
applied  to  obtain  more  accurate  performance  predictions. 

An  analysis  of  a  modular  expansion  was  performed  using  both  models.  One  of  the  key  design 
aspects  is  the  effect  on  performance  due  to  a  modular  expansion,  or  conversely,  the  amount  of 
increased  capability  required  by  the  shared  resource  to  continue  to  deliver  some  threshold  amount 
of  performance.  The  exact  model  yielded  only  a  lower  bound  for  the  incremental  increase  in  the 
SPR  processing  rate  required  to  maintain  system  response  time.  This  lower  bound  was  too  high 
to  be  of  any  practical  use. 

The  analysis  of  the  approximate  model  yielded  a  useful  and  intuitively  satisfying  relation 
between  the  addition  of  ICSs  and  the  incremental  increase  in  SPR  processing  rate  required  to 
maintain  system  response  time.   It  indicated  that  for  each  ICS  added  to  expand  the  system,  the 
required  increase  in  incremental  SPR  processing  rate  was  directly  related  to  the  incremental  traffic 
intensity  caused  by  each  additional  ICS.  The  implication  is  that,  for  example,  by  doubling  the 
number  of  ICSs,  an  increase  in  the  SPR  processing  rate  of  <  2  is  required  to  maintain  system 
response  time,  since  the  traffic  intensity  for  a  stable  system  is  always  <  1.   This  result  is  verified 
when  compared  to  values  predicted  by  the  exact  model.  This  result  will  be  useful  to  designers 
and  analysts  when  they  conider  building  new  systems  or  augmenting  existing  systems  which  are 
based  on  this  class  of  architecture.   Two  examples  illustrating  the  utility  of  the  approximate  model 
were  presented. 
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B.  Research  Extensions 

We  have  stated  that  our  position  in  accepting  this  M/M/l  approximation  was  that  it 
"tracked"  the  performance  measures  predicted  by  the  exact  models,  although  the  values  predicted 
were  in  error  by  a  varying  degree.   Similar  approximate  models  based  on  other  independent 
queueing  systems  or  a  mix  of  different  systems  may  provide  a  better  "fit"  than  the  M/M/l.  The 
M/M/l  system  was  chosen  because  the  simplicity  of  its  mathematical  formulation  provides  both 
an  expression  that  yields  insight  and  one  that  allows  manipulation  for  analysis. 

To  provide  insight  to  the  designer  and  analyst  as  to  the  error  incurred  when  they  apply  this 
approximation  an  error  analysis  was  presented.  This  analysis  was  meant  to  indicate  parameter 
sensitivity  and  trends.   It  was  not  an  elaborate  analysis,  nor  were  the  results  conclusive.  Further 
work  is  needed  to  characterize  the  error  incurred  over  a  wider  range  of  the  parameter  values, 
especially  the  job  allocation  vector,  J,  and  the  number  of  ICSs,  R. 

The  BIN  algorithm  presented  in  chapter  IV,  provided  an  efficient  solution  to  the  approximate 
model  when  the  system  is  balanced  (identical  ICSs).   Additional  work  is  needed  to  establish  an 
efficient  algorithm  for  the  general  case  of  an  unbalanced  system,  because  this  system  requires 
solution  of  a  set  of  simultaneous  nonlinear  equations.   Existing  solution  techniques  should  be 
investigated,  including  converging  iterative  ones  ar  is  the  BIN  algorithm.  Also,  a  similar  error 
analysis  should  be  done  to  determine  if  an  unbalanced  system  results  in  different  accuracy 
patterns  or  is  any  differently  parameter  sensitized  than  a  balanced  system. 

In  certain  situations  the  error  incurred  by  using  the  M/M/l  approximation  is  not  satisfactory. 
The  alternative  is  to  utilize  the  exact  model,  but  the  computation  time  required  may  be  excessive. 
Similar  models,  based  on  other  queueing  systems  may  provide  results  with  a  lower  error  tolerance 
and  less  variation.    It  would  be  useful  to  formulate  efficient  solution  algorithms  and  perform 
similar  error  analyses  for  them,  as  was  done  for  the  M/M/l  approximation.  Assuming  these  other 
approximations  yield  values  significantly  closer  to  the  exact  values,  the  results  of  a  modular 
expansion  analysis  would  be  useful.   The  results  of  this  analysis  would  be  interesting  to  compare 
to  the  results  obtained  from  the  M/M/l  approximation. 
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A  greatest  lower  bound  for  the  exact  model  should  be  pursued.   The  one  established  here  is 
too  high  to  be  of  any  practical  value.   A  value  of  unity  was  conjectured  for  the  exact  model,  but 
not  proven,  in  this  dissertation.   Proving  either  this  conjecture  or  the  fractional  bound  (p), 
estab'ished  through  the  analysis  of  these  approximate  models,  is  a  useful  endeavor. 

There  are  other  research  areas  that  may  be  pursued  based  on  these  modeling  techniques. 
One  is  formulation  of  more  efficient  exact  and  approximate  models  for  architectures  with  multiple 
SPRs.   These  models  would  be  especially  applicable  to  architectures  incorporating  a  LAN 
allowing  high  speed  access  to  multiple  shared  subsystems. 

Another  research  direction  would  be  to  establish  non-exponential  approximate  models. 
These  could  be  based  on  M/G/l  servers  or  M/G/l/K  servers.   The  utility  of  these  models  could 
be  established  by  comparing  their  predictions  to  those  of  Shum's  EPF  model  [SHUM     76, 
SHUM  77].   The  EPF  model  is  an  approximation  based  on  a  queueing  network  formulation,  in 
which  the  product  terms  are  replaced  by  terms  representing  M/G/l/K  servers.  The  computation 
time  and  memory-space  complexity  of  the  EPF  model  are  equivalent  to  those  of  general  queueing 
network  theory.    An  approximate  non-exponential  model  of  time  and  space  complexity 
comparable  to  that  of  the  approximate  SCS  model  would  be  useful. 
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APPENDIX  A 
Review  of  Balance  Equations  for  Queueing  Networks 


1.  The  Single  Server  Queue 

To  understand  the  balance,  or  flow  equations  we  shall  start  with  a  simple  single  server  queue. 
Assume  the  service  time  distribution  is  exponential,  with  mean  u  ,  and  assume  the  arrival  process 
is  Poisson  (i.e.  has  exponentially  distributed  interarrival  intervals),  with  mean  a .  Let  the 
probability  of  n  jobs  in  the  queue  at  time  t  be 

Pr[n(t)]  =  Pr[X(t)=n],       t>0    , 

where  X(t)  is  a  random  variable  and  n  is  the  state  of  the  system,  which  is  the  number  of  jobs  in  the 
queue  (including  any  being  serviced).  Let  us  also  assume  that  in  a  small  time  interval,  At,  at  most 
only  one  event  can  occur.  Therefore,  the  state  probability  balance  equation  is 

Pr[n(t+ At)]  =  Pr[n(t)]  Pr[no  arrv.  +  no  departures  in  At  /  n  at  t] 

+  Pr[n(t)-1]  Pr[l  arrv.  +  no  departures  in  At  /  n-1  at  t] 
+  Pr[n(t)+ 1]  Pr[no  arrv.  +  1  departure  in  At  /  n+ 1  at  t] 
+  Pr[At2] 


2 
where  Pr[At    ]  is  the  collective  probability  that  more  than  one  event  occurs  during  AL    We 

2 
make  At  is  small  enough  so  that  this  probability  is  essentially  zero,  Pr[At  ]  =  0. 

Since  the  arrival  and  departure  processes  are  exponential  (i.e.  memoryless),  we  are  able  to 
remove  the  conditions  in  the  above  equation  which  yields 

Pr[n(t+  At)]    =  Pr[n(t)]  Pr[no  arrv.  +  no  departures  in  At] 
(1)  +  Pr[n(t)-1]  Pr[l  arrv.  +  no  departures  in  At] 

+  Pr[n(t)+ 1]  Pr[no  arrv.  +  1  departure  in  At] 
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By  definition  the  probability  of  each  process  occuring  is 

(-a  At) 
Pr[an  arrival  occurs  in  At]  =  1  -  e  ,   and 

(-uAt) 
Pr[a  departure  occurs  in  At]  =  1  -  e 


Expanding  the  exponential  by  an  infinite  series  and  truncating  after  the  second  term  yields 

2  3 

(a  At)  (a  At) 

-aAt 
e        =  1  -  aAt  + -    +  ...    s    1  -  aAt    . 

2  !  3  ! 


This  truncation  may  occur  since  for  small  At,  At2  «At ,  and  as  At  approaches  zero  in  the  limit,  At2 
approaches  zero  much  more  rapidly.  Therefore,  all  the  higher  order  terms  are  negligible 
compared  to  the  first  order  term.  Substituting  this  back  into  the  probability  definitions  we  obtain 
the  following  approximations 

Pr[an  arrival  occurs  in  At]  ^    aAt  ,  and 

Pr[a  departure  occurs  in  At]  =    uAt 

and  conversely 

Pr[no  arrivals  occur  in  At]  s    1  -  aAt       ,   and 
Pr[no  departures  occur  in  At]  ^    1  -  uAt . 

Noting  that  the  arrival  and  departure  processes  are  independent  of  each  other  allows  us  to 
express  their  joint  probability  as  a  product  of  the  two  probabilities,  which  as  above  may  be 
approximated  by  the  first  order  term,  as 

Pr[no  arriv.  +  no  departures  in  At]  =  (1-aAt)  (TuAt) 

=  1  -aAt-  uAt  +  au(At)2 
jj    l-(a  +  u)At 

Applying  this  truncation  again  allows  us  to  express  the  following  probabilities  as 

Pr[l  arriv.  +  no  departures  in  At]  =  aAt  (1-uAt) 

=  aAt  -  aAt(At)2 

__   aAt  ,    and 
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Pr[no  arriv.  +  1  departure  in  At]    =  (1-aAt)  uAt 

=  uAt  -  au(At)2 

~   uAt 


Substituting  these  results  into  equation  (1)  yields 

(2)  Pr[n(t+At)]  =  Pr[n(t)]  (l-(a+u)At)  +  Pr[n(t)-1]  aAt    +  Pr[n(t)+1]  uAt 

Assuming  an  infinite  size  queue  and,  therefore,  no  upper  limit  on  the  number  of  jobs,  a  lower 
limit  must  still  be  considered  since  it  is  physically  impossible  to  have  less  than  an  empty  queue. 
This  results  in  the  following  boundary  equation  for  an  empty  queue  (i.e.  state  =  "0") 

(3)  Pr[0(t+At)]  =  Pr[0(t)]  (1-aAt)  +  Pr[l(t)]  uAt       . 

When  the  queue  is  empty  no  departures  can  occur  and  there  exists  no  state  =  "-1". 
Rearranging  equations  (2)  and  (3)  algebraically  yields 

Pr[n(t+  At)]  =  Pr[n(t)  + 1]  uAt  -  Pr[n(t)]  (a  +  u)At  +  Pr[n(t)-1]  aAt    +  Pr[n(t)]       ,   and 

Pr[0(t+  At)]  =  Pr[l(0]  uAt  -  Pr[0(t)]  aAt  +  Pr[0(t)] 

Then  dividing  by  At  and  taking  At  to  its  limit  results  in 

Pr[n(t+At)]-Pr[n(t)] 

(4)  lim  =  Pr[n(t)+l]u  -   Pr[n(t)](a+u)  +  Pr[n(t)-1]  a  ,  and 

At->0  At 

Pr[0(t+At)]-Pr[0(t)] 

(5)  lim   =  Pr[l(t)]  u  -  Pr[0(t)]a 

At^O  At 

Note  that  these  equations  are  the  derivatives  (Pr'[n],  etc.)  of  the  state  probabilities  (that  is,  the 
state  probabilistic  rate  of  change).  Assuming  a  stationary  distribution   implies 

Pr[n(t)]  =  Pr[n] 

The  average  rate  of  change  must  be  zero,  or  an  infinite  accumulation  in  some  state  might 
occur.  Therefore,  equations  (4)  and  (5)  are  assumed  to  be  identically   zero,  resulting  in 


128 


Pr'[n]  =0  =  Pr[n  +  l]u-  Pr[n](a  +  u)  +  Pr[n-1]  a  ,    and 

Pr'[0]  =  0  =  Pr[l]  u  -  Pr[0]  a 

By  rearranging  these  equations  we  obtain 

(6)  Pr[n]  (a  +  u)  =  Pr[n  + 1]  u  +  Pr[n-1]  a  ,    and 

(7)  Pr[0]  a  =  Pr[l]  u 

From  studying  these  equations  we  realize  that  the  left  side  is  the  (flow)  rate  out  of  a  state,  while 
the  right  side  is  the  rate  into  that  state.  Therefore,  these  equations  represent  the  balance  of  flow 
between  states,  hence  the  term  balance  (or  flow)  equations. 


2.  Open  and  Closed  Queueing  Networks 

To  obtain  the  balance  equations  for  a  network  of  queues  (i.e.  a  Jacksonian  network  [JACKSO 
63])  the  same  basic  procedures  are  necessary.  We  now  have  an  additional  problem  of  many 
independent  queues  connected  to  each  other.  This  changes  our  scalar  state  n  for  one  queue,  to  a 
vector  N  =  (nQ,  ...  ,ns)to  account  for  each  ofthes  +  1  queues.  The  state  equilibrium  probability 
distribution  becomes 

Pr[N]  =Pr[nn,  ...  ,  n  ] 


where 


K  =  2    n. 
i=0 


and  nj  is  the  number  of  jobs  in  the  i-th  queue,  and  K  is  the  total  number  of  jobs  in  the  network. 
Similarly,  the  mean  arrival  and  departure  rates  become  vectors  also,  one  element  for  each  queue, 
3^  and  Uj.  To  specify  how  these  queues  are  connected,  we  use  constant  transition  probabilities  to 
indicate  the  flow  (transition)  of  one  job  from  queue  j  to  queue  i,  p.  ..  Without  proceeding  through 
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the  tedious  algebra  (as  in  the  previous  section)  we  may  write  [JACKSO  63]  the  left  side  as 

(S    aj+'i    uiy(n.)  )  Pr[N] 
i=0  i=0 

where 

0  ,     n.  <0 
y(nj)  =  { 

1  ,    n.>0 

Note  that  the  extra  factor  y(nj)  is  equivalent  to  accounting  for  our  "less  than  empty"  boundary 
conditions,  since  some  queues  may  be  empty  while  others  are  not.  The  right  side  is  a  little  more 
complex.  We  are  looking  for  all  the  ways  to  reach  the  destination  state  N  through  a  single 
transition  from  a  source  state,  N'.  Any  queue  i  may  receive  a  job  from  any  other  queue  j, 
therefore,  the  source  state  must  be  of  the  form  N'  =  (nQ,  ...  ,  n  +  1,  ...  ,  n^l,  ...  ,ns).  All  possible 
combinations  of  source  states  of  this  form  must  be  accounted  for  on  the  right  side  of  the  equation, 
which  yeilds 

s  s 

2  2      Uj  y^)  Pj  .?r[n0,   ...   ,^  +  1,   ...   ,  nfl,    ...   ,  nj 

j=0         i=0 

s 
+        2      a4  yCnj)  Pr[n0,     ...   ,   n-1,     ...   ,  ns] 
i=0 


These  portions  yield  the  following  for  an  open  queueing  network  with  exponential  service  and 
interarrival  times,  and  an  infinite  queueing  capacity  [JACKSO  63]: 


(8)         [(I   al)  +  (l    u.y(n-))]   Pr[n0,  ...  ,  nj 

i=0  i=0 

s  s 

=  [  2        I    ^  Pjj  y(n.)  Pr[n0,  ...  ,  ^  +  1,  ...  ,  n^l,  ...  ,  nj 
i=0      j=0 

s 

+  [  2    aj  y(nj)  Pr[n0,  ...  ,  nrl,  ...  ,  ns]    ] 
i=0 
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For  a  closed  queueing  network  with  a  constant  number  of  jobs,  K,  continually  circulating 
[GORDON  67]  there  is  no  arrival  process,  which  implies  aj  =  0,  for  all  i.  The  resulting  equation 
is 

(9)  [  z   Ujydii)]^,  ...  ,ns] 

i=0 

s  s 

=   2        2    Uj  Pj  4  y(nj)  Pr[n0,  ...  ,il+1,  ...  ,n.-l,  ...  ,  ns]     . 

i=0      j=0 

It  should  be  noted  that  our  approach  to  arrive  at  equations  (8)  and  (9)  was  an  intuitive  extension 
of  equations  (6)  and  (7),  the  balance  equations  for  a  single  server  queue.  A  more  rigorous 
approach  would  require  a  procedure  similar  to  that  from  which  equations  (6)  and  (7)  were 
derived  [JACKSO  57,  JACKSO  63,  GORDON  67]. 

The  solution  to  these  equations  is  based  on  assuming  a  product  form  solution  [GORDON  67] 
of  the  form 


s 

nv. 


(10a)  Pr[n0,   ...   ,  nj    =         1  n     xk" k      ,    and 

G(K)         k=0 


s 


(10b)  Pr[n0,   ...   ,il+1,   ...   ,  nrl,   ...   ,  nj     =       *,  /^        II 

G(K)       k=0 


where  G(K)  is  a  probability  normalization  factor  and  the  xk's  are  unknowns  whose  solution 
must  be  obtained.  Substituting  the   solution  form  of  (10)  back  into  (9)  yields 


s  s  s  s 

nk 


i       n  x 


2    Uj  y(n.)  1_    n   xk  k    =   [   2       2    Uj  pj(i  y(n.)  Xj.  ] 

i=0  G(K)      k=0  i=0     j=0  '  Xi  G(K)     k=0 


k 


The  term  on  the  right-hand  side  of  the  equation,  outside  the  brackets,  is  always  non  zero  and 
is  a  term  common  to  both  sides.  This  term  is  not  dependent  on  the  indices  i  or  j  and,  therefore, 
may  be  manipulated  without  effecting  the  summations.  Extracting  and  cancelling  this  term  results 
in 
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s  s  s  X- 

2    uA  yCi^)    =2       x    u.  v-}-  yC^)  — 

i=0  i=0      j  =  0  '  X- 


Further  extracting  the  common  term  y(nj)  yields 


s  s  x- 

(11)         2    yCn,)  [  uj  -     2    Uj  Pj  •  —  ]  =  0 

i=0  j=0  '    X- 


The  next  step  in  the  solution  of  this  closed  queueing  network  is  based  on  the  fact  that  at  any  time 
it  is  possible  for  all  but  one  queue  to  be  empty.  Let  this  single  non-empty  queue  be  the  k-th 
queue.  In  this  case 


o 

i*k 

y(n,)   =  { 

1 

i  =  k 

Hence,  (11)  is  reduced  to 


xj 


Uj  -     2    U:  p- —    =0  ,  for  i  =  0,  ...  ,s 

j  =  0  '     Xj 


By  rearranging  the  above  the  following  results  are  obtained: 


s 

xiui  ■   2  xjuj  Pj,i  =° 

j=0 


Letting  e.  =  x;  u4  ;  then 

(12)  ei=2      ejPjl 

j=0 
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which  is  the  vector  equation  E  =  E[P],  where  the  vector  E  =  (eQ,  ...  ,  es)  and  the  matrix  [P]  =  [p. .]. 
The  e^s  are  usually  called  the  relative  visist  frequencies.  Finally  we  can  solve  for  the  probability 
normalization  factor,  by  using  the  fact  that  the  probabilities  must  sum  to  unity.  Hence,  from  (10) 
we  may  formulate 


N 

U"0,...     ,"sj 

G(K) 

N 

i 
i=0 

Therefore, 

(13) 

G(K)  =  s 

N 

s 

n  Xj1   , 

i  =  0 

where  the  summation  over  N  implies  all  non-negative  solutions  to  the  equation 


i=0 


See  [KLIENR  76],  pg216. 

In  summary,   the  state  equilibrium  probabilities  can  be  obtained  from  the  following 


Pr[n0,  ...  ,  ng]    =     1  n    xi  *  ,    where 

G(K)      i=0 


G(K)   =i     n  x:ni 

N     i=0 


3.  The  Central  Server  Model 

A  simple,  nontrival  case  of  a  closed  queueing  network  is  called  the  Central  Server  Model  [BUZEN 
71].   A  diagram  of  this  model  is  shown  in  figure  A-l  and  the  corresponding  transition  probability 
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Figure  A-2.   Central  Server  model  transition  probability  matrix. 
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matrix  is  structured  as  shown  in  figure  A-2.  This  results  in  a  simple  solution  to  (12),  which 
consists  of  s+ 1  simultaneous  equations  with  s+ 1  unknowns.  In  this  case  there  are  only  s 
independent  equations.  Therefore,  one  can  solve  all  these  equations  in  terms  of  one  of  the 
unknowns,  say  eQ.   This  does  not  result  in  a  unique  (absolute)  solution,  but  rather  a  relative 
solution.  Consequently,  any  value  substituted  for  eQ  will  yield  a  solution  satisfying  (12). 
Therefore,  this  relative  solution  is  related  to  the  absolute  solution  by  some  multiplicative 
constant.  The  relative  solution  for  the  Central  Server  Model  is 

E  =(eQ,  e0p01,  ...  ,eQp0s) 

Note  that  the  first  subscript  (j)  of  the  p=  j's  denotes  the  source  of  the  transition.  There  is  only  a 
single  source  in  this  model,  the  central  server.  Therefore,  this  subscript  may  be  dropped  since  it  is 
implied,  resulting  in 

E=  (e0,  eQpv  ...  ,  eoPs) 
Letting  x0=l,  which  then  implies  e0=u0x0=u0  results  in 

E=(u0,  uoPl,  ...  ,u0ps) 
and 

uoPi               uoPs 
X  =  (l,  ,    ...   , )    . 

ul  us 


The  structure  of  the  central  server  model  results  in  a  simple  solution  to  (12).  The  evaluation 

of  (13),  although  straightforward,  requires  a  summation  over  a  state-space  whose  size  increases 

s 
exponentially,  0[  (K  + 1)  ].   Buzen  [BUZEN  71]  developed  an  iterative  method  for  evaluating 

s 

(13)  has  requires  a  computational  complexity  0[Ks],  vs.  0[(K  +  1)  ].  This  method  is  based  on 

an  recursive  partitioning  technique.  Define  the  following  auxiliary  function 

j 

(14)  g(m,j)     =1       n    x"1 


i 


j  i=0 

i=0 


S  n-  =  m 
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Note  that  when  m  =  K  and  j  =  s,  then  G(K)  =  g(K,s).  By  partitioning  g(mj)  based  on  the 
occupancy  of  the  last  queue,  j,  being  either  empty   or  not,  yields 

j-l  j 

(15)       g(m,j)  =2        n   x^1     +  Xj    2        n   x^1    =  g(mj-l)  +  x.  g(m-l j)     , 

j-l        i=0  j         i  =  0 

X  rij=m  2n-  =  m-l 

i=0  i=0 

with  the  following  boundary  conditions 


g(0j)=l  (      0<j<s  and 

g(k,0)=xok        ,      0<k<K. 

Equation  (14)  can  be  evaluated  for  any  values  of  m  and  j  using  the  boundary  conditions  above,  or 
any  other.  By  conceptualizing  g(m,j)  as  a  matrix,  Buzen  presents  an  efficient  iterative  algorithm 
for  evaluating  (14).  As  a  result,  the  normalization  constant  (13)  and  expressions  based  on  it  can  be 
evaluated  much  more  efficiently  than  was  previously  possible. 

Buzen  also  derived  expressions  for  some  performance  measures  which  are  based  on  a  similar 
product  form  structure  and,  therefore,  may  be  efficiently  evaluated  by  the  above  iterative 
technique.  These  performance  measures  include  the  device  busy  probabilities  and  the  mean 
queue  length  for  the  central  server  model.  The  device  busy  probability,  A=,  is  the  probability  that 
the  j-th  queue  is  not  empty  and  can  be  expressed  as 


s  X:  s 

Aj  =  2    Pr[n0,  ...  ,nj  =2      [       1       n   x^]  =  1        U    x^1 

N(3n>l)                        s          G(K)      i=0  G(K)      s         i=0 

Znj:sKOn>l)  2ni  =  K-l 

i=0  i=0 

XJ 

G(K-l) 


G(K) 
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To  derive  the  mean  queue  length  of  the  j-th  queue,  Q.,  first    define  R-(k)  as  the  probability 
that  the  j-th  queue  has  k  or  more  jobs  (i.e.  npk-1),  expressed  as 


n- ,  n- 


R/k)    =  2    Pr[n0,  ...  ,ns]  =2      [ L_    n   Xj  ']  = 2        n    x,  * 

N(3n>k)  s  G(K)       i=0  G(K)     s         i=0 

2  ni  =  K  (3n>k)  2ni  =  K-k 

1=0  i=0 


Hence, 

k 


xj 
R/k)      =    G(K-k) 

G(K) 


Note  that  R/0)=1,  R/l)  =  Aj,  R/K+1)=0,  and  the  probability     that  the  j-th  queue  has 
exactly  k  jobs  is  R:(k)-R.(k+1).  Therefore,  the  mean  queue  length  is 


K  K  K 

Qj     =2    k[  R.(fc)-R.(k+1)]  =  2     k  R/k)  -   2    kR/k+1) 

k=l  k=l  k-1 


K 
2 

[C-:i  k=l 


=  2     k  Rj(k)  -   [   2   (k-1)  Rj(k)     +  KR/K+1)] 


K  K 

R/l)  +  2    k  R/k)  -     2   (k-1)   R/k) 
k  =  2  k=2 


K 
R/l)  +2     [k-(k-l)]  R/k) 

k  =  2 


K 

2   R/k) 

k  =  l 


k       x/  G(K-k) 

2        

k  =  i       G(K) 
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4.  Closed  Queueing  Networks  with  Multiple  Job  Classes 

Basket  et.  al.  [BASKET  75]  have  extended  the  theory  of  queueing  networks  to  include 
multiple  classes  of  jobs,  with  other  than  exponential  service  time  probability  distributions.  This 
extended  model  is  based  on  expanding  the  state  description,  the  state  transition  probabilities,  and 
the  mean  service  rates  to  include  class  distinction.  Define  the  following  (Note:  these  definitions 
differ  somewhat  from  those  used  in  the  rest  of  this  dissertation,  see  Appendix  C,  but  they  conform 
to  the  definitions  generally  associated  with  multi-class  queueing  network  theory): 

N  =  (nQ  , ... ,  ns)  is  the  state-space  vector  description  ; 

=  (N0,...,NS) 
=  (n01,...,n0R,  ...  ,  nsl,...,nsR) 

n4  r  =  number  of  class  r  jobs  at  the  i-th  service  center  (both  in  service  and  in  queue)   ; 

R 
ni  =  2    nir  ,  the  total  number  of  jobs  (of  all  classes)  at  the  i-th  service  center  ; 

r=l 

R  =  (n4 1 , ... ,  n4  R)  is  the  job  class  distribution  at  the  i-th  service  center  ; 

s  s         R 

K  =  2    n4  =  s        2    n4  r ,  the  total  number  of  jobs  in  the  network  ; 

i=0  i=0       r=l 

J  =  (J-L ,     ...     ,   JR)    is  the  job  class  allocation  vector  for  the  network   ; 

Jr  =  the  maximum  number  of  jobs  that  can  be  allocated  to  the  r-th  class  , 

s 
Jr    =  s     ni      is  the  total  number  of  class  r  jobs  in  the  network  ; 

i=0 

u.  r  =  the  mean  service  rate  of  a  class  r  job  at  the  i-th  service   center   ; 

p;     s  =  the  probability  of  a  class  r  job  at  the  i-th  service  center  transiting  to  the  j-th 
service  center  and  becoming  a  class  s  job  ; 

R  =  the  number  of  different  job  classes  in   the  network 
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The  state  equilibrium  probabilities  are  assumed  to  have  the  form 


■ 

1 

s 

Pr[n0  j ,  ...  ,  ns 

,r]    = 

n 

«Ni> 

' 

G(J) 

i=0 

where 

the  product  factors 

are  given 

by 

R 

nj!  n  • 

nir 

X.         ' 

i,r 

PS,  FCFS,  LCFS 

r=l 

V 

fi(Ni)    =  { 

R 

n  • 

IS         ; 

r=l 

V 

the  normalization  constant  is  given  by 


G(j)  =   2     n  qcNj) 

N        i=0 


where  the  summation  over  the  state-space  N  means  all  non  negative  solutions  to 


I    Ni=J 
i=0 


The  relative  visit  frequency,  ei  r ,  is  given  by  the  solution  to 

s        R 

i.r  j,t    Kj.t;i,r  ' 

j  =  0      t=l 
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and  the  relative  load  factor,  xi    ,  is  given  by 


ei,r 


Xi,r     "■ 


Ui,r 


Buzen's  efficient  computational  procedures  have  been  generalized  by  Muntz  and  Wong 
[MUNTZ  74],  Giammo  [GIAMMO  76],  and  Shum  [SHUM  77]  for  networks  with  a  constant 
number  of  jobs  in  each  of  the  R  classes,  specified  by  the  vector  J.  The  efficient  iterative  solution 
technique  of  (15)  for  a  single  class  of  jobs  is  based  on  a  two  dimensional  conceptualization,  jobs  (a 
scalar)  by  devices.  A  direct  extension  of  this  two  dimensional  technique  for  multiple  classes  of 
jobs  requires  the  substitution  of  a  job  class  distribution  vector  for  the  previous  scalar  job 
specification.  Defining  an  auxiliary  function,  as  in  (14),  results  in 


(16)  g(M;k)  = 


X 

k 

n 

nir 

R  v 

nj!    n 

k 
2  I 

i=0 

r=1      ni,r! 

i=0 


Following  a  recursive  partitioning  procedure  similar  to  that  used  to  obtain  (15),  yields  the 
following  recursive  defining  relation  for  the  auxiliary  function  [SHUM  76] 


R 
(17)  g(M;k)    =  g(M;k-l)   +  X       xir  g(M-dr;k) 

r=l 


Equation  (16)  may  also  be  expressed  as  [MUNTZ  74] 


ml  mR 

(18)  g(M;k)  =    x       ...       x      fk(Nk-M)  g(M;k-l) 

nk,l  =  0  nk,R=0 
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where  M  is  a  vector  representing  a  job  class  distribution  such  that 

M  =  (m1  ,     ...    ,  mR)      ,     nij  =  the  number  of  class  ijobs     3   mi<Ji 


and  dr  is  a  difference  vector  such  that 


0  i*r 

dr  =  (br   ...   ,bR)  ,  b,  =  {  ,    i  =  l,...,R    . 


i=r 


The  boundary  conditions  for  this  auxiliary  function  are 

g(0;k)   =  1  ,    0<k<  s        ,    and 

g(M;0)  =f0(M)  ,    0<M<J        . 

The  probability  normalization  constant  is  obtained  by  summing  over  the  entire  state  space, 
which  can  be  expressed  in  terms  of  the  auxiliary  function  as  [MUNTZ  74] 


Jl                JR                     R     xS,r  S'                           •*               R     xi.r  V 
(19)       G(J)  =    2       ...       2       1     ns !  n   {     2        n      nj !  n }   j 


2  Ni=J-Ns 


^.l-O         ns,R=0  r=1       ns,r!  s4         i=0  r=1      ^.r 

2  > 

i=0 


Jl  JR 


=    2       ...       2      [  fs(Ns)   {g(J-Ns;s-l)    }] 


ns,l  =  0         ns,R=0 


=  g(J;s) 

=  G(J) 
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where 


g(N0;0)  =    f0(No) 

f^)    =   I      xir  fjCN^)  ,     and 

r=l 

fi(0)    =1 

This  completes  the  review  of  the  basic  equations  for  current  queueing  network  theory. 

The  following  is  provided  to  demonstrate  the  details  of  the  partitioning  procedure  used  to 
obtain  (17).  We  will  only  use  a  two  queue  system  to  minimize  the  length  of  the  presentation, 
while  still  demonstrating  the  basic  concepts.  First,  we  expand  the  function  definition  of  (16)  to 
obtain  the  recursive  relation.  Then  as  an  alternative  approach  we  expand  the  right-hand  side  of 
(17)  by  substituting  (18) ,  and  show  that  the  left-hand  side  of  (17)  is  obtained. 

From  (16)  we  obtain 


r       x    n°'r 
R        X0,r 


g(N0;0)    =    f0(N0)  =n0!  n  

r=l     n0r! 

(n0>t-dr) 


R  R  \0t 


1       x0r[(n0-l)!  n     ] 

r=i  t=l     (n0t-dr)! 


Therefore, 


g(N0;0)    =   2      x0r  g(N0-dr;0) 

r=l 
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From  (18),  substituting  this  definition,  we  also  obtain 


ml 

nlr 
mR                    R      Xlr  i,r                            0                 R 

nlr 

xi,r    ' 

g(M;l)=    2       .. 

2    {n^  n           jis    n    njin- 

nl,l=0 

"LR^O              r=1       nl,r!              °          i=0             r=1 

2  N-sM-Nj 

V 

} 


i=0 


ml  mR 


=    2       ...       2    f^)  g(M-Ni;0) 


nl,l_0  nl,R-0 


and 


nis 
m-i  m-  1  mo  R       X-i       ' 


g(M-dr;l)     =2  ...      2       ...      2      ni!n g(M-dr-Ni;0) 

nl,l=0  nl,r=0         nl,R=0         s  =  1       nl,s! 

R         x     (dsnl,s-dr) 

1  r                  R                                     Is 

=     2  ...  2        ...       2         (n^l)!   n    — -    gCM-N^O) 

ni,r°  ni,r=1      ni,R=0               s=1     (dsni,s"dr)! 

ml  mr               mR 

=     2  ...    2       ...     2        ^(N^)    g(M-Ni;0) 

nl,l=0  nl,r=1          nl,R=0 


Restructuring  the  above  yields 


g(M;l)     =f1(0)g(M;0)    +2     f^)  gCM-N^O) 

R 
=  g(M;0)     +    2       [     2     x1>r  f^-d,)  ]    g(M-Ni;0) 

(KN^M       r=l 


=  g(M;0)     +    2      [      2       ...    2       ...     2       x1>r  f^-d,)  g(M-Ni;0)  ] 
r=l  nu=0       nlr=l       n1R  =  0 
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ml  mr  mR 


=  g(M;0)    +    2     xlr   |      I       ...    x       ...     2       ^(N^)  g(M-Ni;0)  ] 


r=l  nu  =  0       nu=l        n1>R=0 


R 

=  g(M;0)    +    2     xlr    g(M-dr;l) 
r=l 


which  is  also 


g(M;l)     ={f1(M)g(0;0)    +  ^(M-d^gCQ+d^O)  +  ...    +  f^O  +  d^gCM-d^O) 
+  fi(fi)g(M;0)} 


=  {fi(M)g(fl;0)   +f1(M-dr)g(0+dr;0)  +  ...    +  f^O+d^M-d^O)} 
+  f1(Q)g(M;0) 


R 

=  g(M;0)    +  2     xlr  g(M-dr;l) 

r  =  l 


As  an  alternative  approach  we  will  expand  the  right-hand  side  of  (17),  substituting  (18)  and 
derive  the  equality  of  (17). 


R  R 

2     xlr  g(M-dr;l)  =  2     xlr{     2       ...     2       ...     2       f^-dj.)    g(M-Ni;0)  } 
r=l  r=l  nll=^        nlf=^        nlR=^ 


=  2   xlr  {^(M-d^gcao)    +  ...  +  f^gCM-d^o)} 


r  =  l 
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={f1(M)g(a;0)    +f1(M-dr)g(0  +  dr;0)  +  ...    +f1(0  +  dr)g(M-dr;0)}   , 


and  by  adding  the  final  term  to  complete  the  series  and  by  noting  that 

g(M;0)    =  f^gCMjO)  ,   and 

fi(Q)   =  l 


we  obtain 


g(M;0)     +  2    xlrg(M-dr;l)    =  {  fx(M)  g(ftO)    +  ^(M-d^gCfi+d^O) 

+  f1(Q+dr)g(M-dr;0)}+g(M;0) 


r=l 

+ 


=  g(M;l) 


Following  the  above  procedure  it  can  be  shown  that  the  general  case  yields 


R 
g(M;k)    =g(M;k-l)     +    I     xir    g(M-dr;k) 

r=l 
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APPENDIX  B 


GLOSSARY 


ASSIGN 

BIN 

balanced  system 

C 

EPF 

FAC 

FCFS 

ICS 

IDBS 

IS 

iid 

LAN 

LCFS 

LSI 

M/G/l 


the  assignment  algorithm 

the  BIN  algorithm 

each  ICS  is  identical 

coefficient  of  variation 

extended  product  form  model 

factorial  expansion  algorithm 

first-come  first-served  scheduling  policy 

independent  computing  system 

inventory  data  base  subsystem 

scheduling  policy  consisting  of  an  infinite  number  of  servers,  i.e.  no 
scheduling  policy 

statistically  independent  and  indentically  distributed 

local  area  network 

last-come-first-served-preemptive-resume  scheduling  policy 

large  scale  (circuit)  integration 

a  single  server  queue  with  Poisson  arrival  and  general  service  time 
distribution 
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M/G/l/K 

M/M/l 

M/M/l/K 

POS 

PPU 

PS 

Relative  error 

SCS 

SOP 

SPR 


a  single  server  queue  with  Poisson  arrival  and  general  service  time 
distribution,  with  a  queueing  limit  of  K  jobs 

a  single  server  queue  with  Poisson  arrival  and  exponential  service  time 
distribution 

a  single  server  queue  with  Poisson  arrival  and  exponential  service  time 
distribution,  with  a  queueing  limit  of  K  jobs 

point-of-sales 

peripheral  processing  unit 

processor  sharing  scheduling  policy 

(Exact  value  -  Approximate  value)/(Exact  value) 

Shared  Central  Server 

sum-of-products  expansion  algorithm 

shared  processing  resource 
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APPENDIX  C 


MATHEMATICAL  NOTATION 


mean  arrival  rate  for  device  (ij) 


A. .  the  busy  probability  of  device  (i,j) 


f$  assumed  SPR  increase  processing  rate  factor 


L-l       R 

e. .  =  i       i    et    pt   • :  relative  visit  frequency  for  device  (ij) 

t=0     r=l 


0  r*i 

d^Cbp ...  ,bR)  a  unit  vector,       3     br  =  \  ,       r=l, ...  ,  R 

1  r=i, 


V 

R 

!  n 

r  =  l 

V 

PS,  FCFS,  LCFS 

fiW    ={ 

J"          product  factors 

R 

n  • 

nir 
xi,r    ' 

IS 

r=l 

ni,r! 

G(K)    =G(J) 

normalization  constant 

g(M;k) 

auxiliary  iterative  function  used  in  the  computation  of  the 

normalization  constant 
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h(i,Jj )  aggregate  of  the  auxiliary  iterative  function 

i,j  index  denoting  device  j  within  the  i-th  ICS 

J  =  (J,  ,   ...   ,  JR)  is  the  job  class  allocation  vector  for  the  network 


J.    =5;     nir  the  maximum  number  of  jobs  that  can  be  allocated  to  the  i-th 

r=i     '  class  (i-th  ICS) 

Jr    =  Jj  for  i  and  r  =  1, ...  ,R   when  the  network  is  balanced,  and  in  Chapter  IV 

J  is  used  as  a  generic  scalar  such  that  J  =  J{ 


R  R  si 

K  =1    Jj  =  2  s  nir       the  total  number  of  jobs  in  the  network 

i  =  l  i  =  l  r=0 

K  =  RJj  ,    i  =  1, ...  ,R     when  the  network  is  balanced 


R 

2 

i=0 


L=  2   Sj  total  number  of  devices  in  the  network 


M  =  (nip  ... ,  mR)  is  a  dummy  counting  vector  that  may  range  over  the  job  allocation 

vector ,  R  R 

3   (Km^Jj  ,  ||M||  =n   (mj  +1)  ,  and  |M|  =  2   m{ 

i=l  i=l 


R 

||M||  =n   (mj+l)        is  the  vector  range  (product) 
i  =  l 


R 
|M|  =  I   nij  is  the  vector  value  (summation) 

i  =  l 


n  =  (nQ , ... ,  n  )  state  vector  description  of  the  network 

=  (n10,...,nlv  ...  ,  nR0,...,nRsR) 

n4  r  number  of  class  i  jobs  at  the  r-th  service  center  of  the  i-th  ICS  (both  in 

service  and  in  queue) 
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iij  the  total  number  of  jobs  (of  all  classes)  at  the  i-th  ICS  or  the  SPR 


,  =  { 


R 

1      n.,r 

i>0 

r  =  l 

R 

Z      nr,0 

i  =  0 

r=l 

NQ  =(n10, ... ,  nR0)  is  the  job  class  distribution  at  the  SPR 


0[  X  ]  the  order  of  complexity  of  X 


Prf^  q  ,  ...  ,  nR  s  ]  state  equilibrium  probabilities 

'  *  R 


Pj  rj  t  the  probability  of  a  class  r  job  at  the  i-th  service  center  transiting  to 

thej-th   service  center  and  becoming  a  class  tjob 


pi  •  transistion  probability  from  the  i-th  CPU  to  the  j-th  device  in  the 

i-th  ICS  or  to  the  SPR,  where  0<j<s; 

Pj  =  pjj  ,    i  =  l, ...  ,R   andj  =  0,  ...  ,s{   when  network  is  balanced 


P[ni  j^k]      -  2    P(n)         the  marginal  probability  that  device  (i,j)  is  serving  k  or  more  jobs 
n    3  n.>k 

Q.  ■  the  mean  queue  length  of  device  (i,j) 


p  traffic  intensity  of  device  (i,j) 

i.j 


R  number  of  ICSs  in  the  network,  which  is  equal  to  the  number  of 

different  classes  in  the  network 


S  =  (s0, ... ,  sR)  the  device  allocation  vector 
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Sj  number  of  devices  in  the  i-th  ICS  (i>0)  or  the  SPR  (i  =  0) 

s = Sj  ,    i  =  1, ...  ,R  when  network  is  balanced 


T-  =  average  throughput  of  device  (ij) 


u- ;  mean  processing  rate  of  device  (i,j) 

u=  =  Uj  =  ,    i  =  1, ...  ,R   and  j  =  0,  ...  ,  Sj    when  network  is  balanced 


W  mean  wait  time  at  device  (ij) 

ij 


Xy    = relative  load  factor  of  device  (i  j) 
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APPENDIX  D 


SAMPLE  SPACE  FROM  ASSIGN  ALGORITHM 


S=  2,6,11, 

P(SPR)  =  0.100, 0.250, 0.500, 0.750, 0.900, 

TOTAL  C  COMBINATIONS  =  15 


1.  P(SPR)  =  0.100 
S  =  2 

1  2 

P(J)=  0.132    0.768 
U(J)=  1.000   1.000 

2.  P(SPR)  =  0.250 
S  =  2 

1  2 

P(J)=  0.182    0.568 
U(J)=  1.000    0.020 

3.  P(SPR)  =  0.500 
S  =  2 

1  2 

P(J)=  0.024    0.476 
U(J)=  1.000    1.000 

4.  P(SPR)  =  0.750 
S  =  2 

1  2 

P(J)=  0.010    0.240 
U(J)=  1.000    0.100 

5.  P(SPR)  =  0.900 
S  =  2 

1  2 

P(J)=  0.029    0.071 
U(J)=  1.000    5.000 

6.  P(SPR)  =  0.100 

S  =  6 

12  3  4  5  6 

P(J)=  0.276    0.374    0.006    0.125     0.026    0.093 
U(J)=  1.000    0.200    0.020    0.100     0.020    0.020 
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7.  P(SPR)  =  0.250 

S  =  6 

12  3  4  5  6 

P(J)=  0.386    0.083    0.077     0.020     0.037    0.147 
U(J)=  1.000    0.2001   0.000    0.200     0.010    0.100 

8.  P(SPR)  =  0.500 
S  =  6 

12  3  4  5  6 

P(J)=  0.164  0.139   0.009     0.081      0.071      0.036 
U(J)=  1.000  2.000  0.020     5.000      0.200     0.100 

9.  P(SPR)  =  0.750 
S  =  6 

12  3  4  5  6 

P(J)=  0.054  0.051    0.025    0.011     0.068     0.041 
U(J)=  1.000  0.020  10.000    0.100     0.200     0.500 

10.  P(SPR)  =  0.900 
S  =  6 

12  3  4  5  6 

P(J)=  0.061   0.003    0.004     0.001    0.013     0.018 
U(J)=  1.000  0.500    0.020     0.500    0.010     2.000 

11.  P(SPR)  =  0.100 
S  =  11 

1234  56789         10         II 

P(J)=  0.376  0.259  0.046  0.138    0.006  0.025    0.020    0.001  0.011    0.013    0.005 
U(J)=  1.000  0.500  1.000  0.050    5.000  0.020   10.000   5.000  0.200    1.000    0.010 

12.  P(SPR)  =  0.250 
S  =  11 

12  3         4  5         6         7  8         9         10         11 

P(J)=  0.551  0.040    0.017    0.089    0.035   0.007   0.003   0.001   0.005   0.001  0.001 
U(J)=  1.000  2.000  10.000  10.000  10.000   2.000  0.010   0.200  0.020  0.200  0.020 

13.  P(SPR)  =  0.500 
S=  11 

1234  56789         10         11 

P(J)=  0.288   0.042   0.016   0.098    0.007   0.004    0.028   0.008   0.001    0.001   0.007 
U(J)=  1.000   1.000   5.000   5.000    0.010  0.010    1.000   1.000  0.100    2.000   1.000 

14.  P(SPR)  =  0.750 
S  =  11 

12  3         4  5         6         7  8         9         10         11 

P(J)=    0.088   0.113    0.006   0.013   0.012   0.004   0.006    0.005   0.001   0.001   0.001 
U(J)=   1.000   0.020  10.000  0.020   0.100   2.000  0.050  10.000  0.200   2.000  0.020 

15.  P(SPR)  =  0.900 
S  =  11 

12  3         4  5         6         7  8  9         10         11 

P(J)=    0.003   0.027   0.024   0.006   0.005   0.020   0.006    0.001    0.006   0.001   0.001 
U(J)=   1.000   5.000   0.050   1.000  0.010    1.000  10.000   5.000    0.100   5.000  0.500 
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